CN111368923B

CN111368923B - Neural network training method and device, electronic equipment and storage medium

Info

Publication number: CN111368923B
Application number: CN202010148544.8A
Authority: CN
Inventors: 王娜; 宋涛; 刘星龙; 黄宁; 张少霆
Original assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Priority date: 2020-03-05
Filing date: 2020-03-05
Publication date: 2023-12-19
Anticipated expiration: 2040-03-05
Also published as: TW202133787A; KR20220009451A; JP2022537974A; CN111368923A; TWI770754B; WO2021174739A1

Abstract

The disclosure relates to a neural network training method and device, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring position information and category information of a target area in a sample image; cutting out at least one target area according to the position information of the target area; classifying the cut at least one target area according to the category information to obtain N types of sample image blocks; and inputting the N types of sample image blocks into a neural network for training. According to the neural network training method, the fine classification of the sample image blocks can be obtained, and the neural network is trained, so that the neural network can finely classify the images, the classification efficiency is improved, and the medical diagnosis accuracy is improved.

Description

Neural network training method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the field of computer technology, and in particular, to a neural network training method and device, electronic equipment and a storage medium.

Background

The machine learning method is widely applied in the field of image processing, for example, the machine learning method can be applied to the fields of classification of common images or three-dimensional images, image detection and the like. For example, in the processing of medical images, the category of a disease, detection of a lesion region, and the like may be determined by a machine learning method.

In the processing of medical images, classification and detection of medical images of the lungs, such as computed tomography (CT, computed Tomography) of the lungs, has an important role in medical screening and diagnosis of lung inflammation, lung cancer, etc. Lung cancer is one of the most common malignant tumors in China, and the death rate of the lung cancer is the first place of cancer death in cities or rural areas, men or women, wherein adenocarcinoma accounts for about 40% of all lung cancers. Screening using medical images (e.g., lung CT and low dose helical CT), more and more early lung adenocarcinomas are found and present as ground-glass nodes (GGNs). Adenocarcinomas are classified as invasive prostate cancer atypical adenoma hyperplasia (Atypical adenomatous hyperplasia of preinvasive adenocarcinoma, AAH), in situ adenocarcinomas (adenocarcinoma in situ, AIS), minimally invasive adenocarcinomas (Minimally invasive adenocarcinoma, MIA) and invasive adenocarcinomas (invasive adenocarcinoma, IA). As tumor size increases, survival declines significantly, suggesting that early discovery and diagnosis is an effective and critical method of reducing mortality in patients. Thus, early detection of invasive features prior to surgery would be clinically important and could provide guidance for clinical decisions. However, due to the lack of typical radiological features (bubble definition, pleural retraction, etc.) of early cancers, it is clinically difficult for an expert or radiologist to accurately identify diagnostic subtype GGN categories from CT images. In this case, computer-aided diagnosis based on artificial intelligence is a more efficient method of assessing node aggressiveness, and is expected to play an important role in clinical assessment tasks.

In the related art, it is generally predicted whether an input image of a nodule belongs to a malignant tumor or a benign tumor by means of machine learning or the like, and there is no technique for further classifying the image.

Disclosure of Invention

The disclosure provides a neural network training method and device, electronic equipment and a storage medium.

According to an aspect of the present disclosure, there is provided a neural network training method, including: acquiring position information and category information of a target area in a sample image; cutting out at least one target area according to the position information of the target area; classifying the cut at least one target area according to the category information to obtain N types of sample image blocks, wherein N is an integer and is larger than or equal to 1; and inputting the N types of sample image blocks into a neural network for training.

According to the neural network training method, the fine classification of the sample image blocks can be obtained, and the neural network is trained, so that the neural network can finely classify the images, the classification efficiency is improved, and the medical diagnosis accuracy is improved.

In one possible implementation, the sample image is a medical image picture.

In one possible implementation manner, the acquiring the location information and the category information of the target area in the sample image includes: positioning a target area on a medical image picture to obtain the position information of the target area; acquiring a pathology picture associated with the medical image picture; and labeling the category information of the target area on the imaging picture according to the pathological information of each target area on the pathological picture.

In one possible implementation, inputting the N types of sample image blocks into a neural network for training includes: inputting any sample image block into the neural network for processing to obtain category prediction information and a prediction target area of the sample image block; determining a classification loss based at least on the class prediction information and the class information of the sample image block; determining a segmentation loss according to the position information of the prediction target region and the sample image block; training the neural network based on the classification loss and the segmentation loss.

In one possible implementation, determining the classification loss according to the class prediction information and the class information of the sample image block includes: determining a first classification loss according to the classification prediction information and the classification information of the sample image block; determining a second classification loss according to the classification prediction information and the classification information of the class center of the class to which the sample image block belongs; and carrying out weighted summation processing on the first classification loss and the second classification loss to obtain the classification loss.

By the method, the class characteristics of the sample image blocks of the same class can be gathered in training, so that the characteristic distance between class information of sample image blocks of different classes is larger, the classification performance is improved, and the classification accuracy is improved.

In one possible implementation, determining the segmentation loss according to the position information of the prediction target region and the sample image block includes: determining a first weight of the prediction target region and a second weight of a sample background region in the sample image block according to a first proportion of the number of pixels of the prediction target region in the sample image block; and determining the segmentation loss according to the first weight, the second weight, the prediction target region and the position information of the sample image block.

In one possible implementation manner, determining the first weight of the prediction target region and the second weight of the sample background region in the sample image block according to the first proportion occupied by the pixel number of the prediction target region in the sample image block includes: determining a second proportion of a sample background area in the sample image block according to a first proportion of the number of pixels of the prediction target area in the sample image block; the second ratio is determined as the first weight and the first ratio is determined as the second weight.

By the method, errors of the target area and errors of the non-target area can be balanced, network parameter optimization is facilitated, and training efficiency and training effect are improved.

In one possible implementation, the categories are those including: adenocarcinomas are classified into invasive prostate cancer atypical adenoma hyperplasia nodules, in situ adenocarcinoma nodules, minimally invasive adenocarcinoma nodules and invasive adenocarcinoma nodules.

In one possible implementation, the neural network includes a shared feature extraction network, a classification network, and a segmentation network, the method further comprising: inputting an image block to be processed into a shared feature extraction network for processing to obtain target features of the image block to be processed, wherein the shared feature extraction network comprises M shared feature extraction blocks, and the input features of the ith shared feature extraction block comprise output features of the first i-1 shared feature extraction blocks, i, M is an integer and 1 < i is less than or equal to M; inputting the target features into a classification network for classification processing to obtain the class information of the image block to be processed; and inputting the target features into a segmentation network for segmentation processing to obtain a target region in the image block to be processed.

In this way, the target feature can be obtained through the shared feature extraction network, the shared feature extraction block of the shared feature extraction network can obtain the output features of all the previous shared feature extraction blocks, and input the own output features to all the subsequent shared feature extraction blocks. The gradient flow in the network can be enhanced, the gradient disappearance phenomenon is relieved, and the feature extraction and learning capacity is improved, so that finer classification and segmentation processing of the input image blocks to be processed are facilitated. Finer category information and target areas of the image blocks to be processed can be obtained, and image processing efficiency is improved.

In one possible implementation manner, inputting the image block to be processed into the shared feature extraction network for processing, to obtain the target feature of the image block to be processed, including: carrying out first feature extraction processing on the image block to be processed to obtain first features of the image block to be processed; inputting the first feature into a first shared feature extraction block to obtain the output feature of the first shared feature extraction block, and outputting the output feature of the first shared feature extraction block to the subsequent M-1 shared feature extraction blocks; inputting the output features of the previous j-1 shared feature extraction blocks into the jth shared feature extraction block to obtain the output features of the jth shared feature extraction block, wherein j is an integer and 1 < j < M; performing second feature extraction processing on the output features of the M-th shared feature extraction layer to obtain second features of the image block to be processed; and carrying out pooling treatment on the second characteristic to obtain the target characteristic.

In this way, the target feature can be obtained through the shared feature extraction network, the shared feature extraction block of the shared feature extraction network can obtain the output features of all the previous shared feature extraction blocks, and input the own output features to all the subsequent shared feature extraction blocks. The gradient flow in the network can be enhanced, the gradient disappearance phenomenon is relieved, and the feature extraction and learning capacity is improved, so that finer classification and segmentation processing of the input image blocks to be processed are facilitated.

In one possible implementation, the method further includes: preprocessing an image to be processed to obtain a first image; positioning a target area on a first image, and determining the position information of the target area in the first image; and cutting out at least one image block to be processed according to the position information of the target area.

According to an aspect of the present disclosure, there is provided a neural network training device including: the acquisition module is used for acquiring the position information and the category information of the target area in the sample image; the first cutting module is used for cutting out at least one target area according to the position information of the target area; the classifying module is used for classifying the cut at least one target area according to the category information to obtain N types of sample image blocks, wherein N is an integer and N is equal to or greater than 1; and the training module is used for inputting the N types of sample image blocks into a neural network for training.

In one possible implementation, the sample image is a medical image picture.

In one possible implementation, the acquiring module is further configured to: positioning a target area on a medical image picture to obtain the position information of the target area; acquiring a pathology picture associated with the medical image picture; and labeling the category information of the target area on the imaging picture according to the pathological information of each target area on the pathological picture.

In one possible implementation, the training module is further configured to: inputting any sample image block into the neural network for processing to obtain category prediction information and a prediction target area of the sample image block; determining a classification loss based at least on the class prediction information and the class information of the sample image block; determining a segmentation loss according to the position information of the prediction target region and the sample image block; training the neural network based on the classification loss and the segmentation loss.

In one possible implementation, the training module is further configured to: determining a first classification loss according to the classification prediction information and the classification information of the sample image block; determining a second classification loss according to the classification prediction information and the classification information of the class center of the class to which the sample image block belongs; and carrying out weighted summation processing on the first classification loss and the second classification loss to obtain the classification loss.

In one possible implementation, the training module is further configured to: determining a first weight of the prediction target region and a second weight of a sample background region in the sample image block according to a first proportion of the number of pixels of the prediction target region in the sample image block; and determining the segmentation loss according to the first weight, the second weight, the prediction target region and the position information of the sample image block.

In one possible implementation, the training module is further configured to: determining a second proportion of a sample background area in the sample image block according to a first proportion of the number of pixels of the prediction target area in the sample image block; the second ratio is determined as the first weight and the first ratio is determined as the second weight.

In one possible implementation, the neural network includes a shared feature extraction network, a classification network, and a segmentation network, the apparatus further comprising: the image processing device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for inputting an image block to be processed into a shared feature extraction network to be processed to acquire target features of the image block to be processed, the shared feature extraction network comprises M shared feature extraction blocks, the input features of the ith shared feature extraction block comprise output features of the first i-1 shared feature extraction blocks, i and M are integers, and i is more than 1 and less than or equal to M; the classification module is used for inputting the target characteristics into a classification network for classification processing to obtain the category information of the image block to be processed; and the segmentation module is used for inputting the target characteristics into a segmentation network for segmentation processing to obtain a target area in the image block to be processed.

In one possible implementation, the obtaining module is further configured to: carrying out first feature extraction processing on the image block to be processed to obtain first features of the image block to be processed; inputting the first feature into a first shared feature extraction block to obtain the output feature of the first shared feature extraction block, and outputting the output feature of the first shared feature extraction block to the subsequent M-1 shared feature extraction blocks; inputting the output features of the previous j-1 shared feature extraction blocks into the jth shared feature extraction block to obtain the output features of the jth shared feature extraction block, wherein j is an integer and 1 < j < M; performing second feature extraction processing on the output features of the M-th shared feature extraction layer to obtain second features of the image block to be processed; and carrying out pooling treatment on the second characteristic to obtain the target characteristic.

In one possible implementation, the apparatus further includes: the preprocessing module is used for preprocessing the image to be processed to obtain a first image; the positioning module is used for positioning the target area on the first image and determining the position information of the target area in the first image; and the second cutting module is used for cutting out at least one image block to be processed according to the position information of the target area.

According to an aspect of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to: and executing the neural network training method.

According to an aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described neural network training method.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the technical aspects of the disclosure.

FIG. 1 illustrates a flow chart of a neural network training method, according to an embodiment of the present disclosure;

FIG. 2 illustrates an application schematic of a neural network training method according to an embodiment of the present disclosure;

FIG. 3 illustrates a block diagram of a neural network training device, according to an embodiment of the present disclosure;

FIG. 4 illustrates a block diagram of an electronic device, according to an embodiment of the present disclosure;

fig. 5 shows a block diagram of an electronic device, according to an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the disclosure will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

In addition, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the present disclosure.

Fig. 1 shows a flowchart of a neural network training method according to an embodiment of the present disclosure, as shown in fig. 1, the method including:

in step S11, position information and category information of a target area in a sample image are acquired;

in step S12, cutting out at least one target area according to the position information of the target area;

in step S13, classifying the cut at least one target area according to the category information to obtain N types of sample image blocks, where N is an integer and N is equal to or greater than 1;

in step S14, the N-type sample image blocks are input into a neural network for training.

In one possible implementation, the neural network training method may be performed by a terminal device or other processing device, where the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a personal digital assistant (Personal Digital Assistant, PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like. The other processing device may be a server or cloud server, etc. In some possible implementations, the neural network training method may be implemented by way of a processor invoking computer readable instructions stored in a memory.

In one possible implementation, the sample image is a medical image picture, e.g., a lung CT image or the like. The sample image block may be an image block including a target region in the sample image. In an example, the sample image may be a annotated (e.g., category annotated and segmentation annotated) three-dimensional medical image, and the sample image block may be an image block of the three-dimensional medical image that contains nodules.

In one possible implementation, in step S11, the location information and the category information of the target area in the sample image may be determined to obtain a sample image block for training the neural network, and the sample image block is labeled. Step S11 may include: positioning a target area on a medical image picture to obtain the position information of the target area; acquiring a pathology picture associated with the medical image picture; and labeling the category information of the target area on the imaging picture according to the pathological information of each target area on the pathological picture.

In one possible implementation, the sample image may be resampled, obtain a resolution of 1× 1 x 1 three-dimensional image. And clipping the three-dimensional image, for example, in a three-dimensional medical image of the lung, a region except a part of lung parenchyma may exist, and lesions such as lung nodules often exist in the lung parenchyma, so that the region where the lung parenchyma is located is clipped, and the region is normalized. And the target area (for example, focus area) in the three-dimensional image after normalization processing can be positioned, so that the position information of the target area can be obtained. For example, the location information of the target area may be determined by a convolutional neural network for localization, or the location information of the target area may be confirmed by a professional such as a doctor, etc., and the present disclosure does not limit the localization manner.

In one possible implementation, the medical image picture may have an associated pathology picture that may be used to determine a category of lesion in the medical image picture, e.g., the category of lesion may include ground-glass node (GGN). Adenocarcinomas are classified into invasive prostate cancer atypical adenoma hyperplasia (Atypical adenomatous hyperplasia of preinvasive adenocarcinoma, AAH), in situ adenocarcinomas (adenocarcinoma in situ, AIS), minimally invasive adenocarcinomas (Minimally invasive adenocarcinoma, MIA) and invasive adenocarcinomas (invasive adenocarcinoma, IA), the present disclosure is not limited to the type of lesion.

In one possible implementation, the pathological information of each target area may be obtained according to a pathological picture, for example, the pathological picture may be a picture after professional diagnosis, may have an analysis description of each focus, and the pathological information of each target area may be obtained according to the pathological picture, so as to label the category information of each target area on the imaging picture.

In one possible implementation, a region including a lesion may be cropped from an imaging picture, i.e., a target region is cropped, and N types of sample image blocks are obtained according to class information of the target region. For example, through statistics of the node sizes, the size of the sample image block may be determined as 64 x 64, four classes (GGN, AAH, AIS and IA) of sample image blocks are obtained by clipping and classification.

In one possible implementation, the number of medical image pictures is small, the labeling difficulty is high, and the cost is high, and if the three-dimensional image is split into a plurality of two-dimensional images, spatial information is lost, so that the performance is reduced. The sample image blocks can be subjected to operations such as rotation, translation, mirroring, scaling and the like, the number of samples can be amplified, and the neural network is trained by using the amplified sample image blocks, so that the generalization capability of the neural network can be improved, and overfitting is prevented. Further, positive and negative samples can be balanced, in an example, the number of samples of benign nodules such as atypical adenoma hyperplasia, in-situ adenocarcinoma, minimally invasive adenocarcinoma and the like of invasive adenoma and the number of samples of malignant nodules such as invasive adenocarcinoma and the like are greatly different, and the number of samples with fewer numbers can be amplified by the method, so that the number of positive and negative samples is balanced. The present disclosure does not limit the manner in which the number of samples is amplified.

In one possible implementation, the sample image blocks may be input into the neural network in batches. Wherein, step S14 may include: inputting any sample image block into the neural network for processing to obtain category prediction information and a prediction target area of the sample image block; determining a classification loss based at least on the class prediction information and the class information of the sample image block; determining a segmentation loss according to the position information of the prediction target region and the sample image block; training the neural network based on the classification loss and the segmentation loss.

In one possible implementation, the neural network may include a shared feature extraction network, a classification network, and a segmentation network. The sample image block can be subjected to feature extraction through the shared feature extraction network to obtain sample target features of the sample image block, category prediction information of the sample image block can be obtained through the classification network, errors can exist in the category prediction information, and classification loss of the neural network can be determined through the category prediction information and the category labeling information of the sample image block.

In one possible implementation, determining the classification loss according to the class prediction information and the labeling information of the sample image block includes: determining a first classification loss according to the classification prediction information and the labeling information of the sample image block; determining a second classification loss according to the classification prediction information and the classification information of the class center of the class to which the sample image block belongs; and carrying out weighted summation processing on the first classification loss and the second classification loss to obtain the classification loss.

In one possible implementation, the annotation information for the sample image block may include category annotation information, e.g., the category annotation information may be information representing a category of a nodule in the sample image block. In an example, the category prediction information may be category information represented by a vector or the like, and the probability distribution of the image block to be processed represented by the vector to belong to each category may be determined by a probability dictionary or the like, so as to determine the category to which the image block to be processed belongs. Alternatively, the vector of the class prediction information may directly represent the probability of the image block to be processed, and in an example, each element of the vector represents the probability of the class to which the image block to be processed belongs, respectively.

In one possible implementation, the first classification loss may be determined based on the class prediction information and the class annotation information of the sample image block, e.g., a feature distance (e.g., euclidean distance, cosine distance, etc.) between the vector of the class prediction information and the vector of the class annotation information may be determined, and the first classification loss L may be determined based on the feature distance _sm For example, the first classification loss L may be calculated from a softmaxloss loss function _sm . In an example, the first classification loss L may be determined by the following equation (1) _sm ：

Wherein x is _i Class prediction information representing the i-th sample image block, y _i The category to which the i-th sample image block belongs is represented, and n represents the number of categories.Representing the y-th in the fully connected layer _i Weights of individual categories->Representing the weight of the j-th class in the fully connected layer, m represents the number of sample image blocks of the input neural network per batch, +.>A class bias term, b, representing the class to which the i-th sample image block belongs _j The bias term representing the j-th category.

In one possible implementation, training using the first classification loss may expand the inter-class feature distance of the class information of different classes, thereby making the classification network indistinguishableSample image blocks of the same class. However, the differences between the various types of nodules in the lung are not obvious (e.g., the shape differences between the nodules of in situ adenocarcinoma and minimally invasive adenocarcinoma are not large), the shapes between the two types of nodules are different (e.g., the shapes of malignant nodules such as invasive adenocarcinoma are different), thus resulting in small inter-class feature distances between the category information and large intra-class feature distances, resulting in the use of only the first classification loss L _sm The classification effect of the trained classification network is poor.

In one possible implementation, the classification network may be trained through a second classification loss for the above-described problem. In an example, class information of class centers of each class in the plurality of sample image blocks may be determined, for example, the class information of the class centers of the plurality of sample image blocks may be weighted averaged, or the class information of the sample image blocks may be clustered to obtain class center features, etc., and the present disclosure does not limit the class information of the class centers.

In one possible implementation, the second classification loss may be determined according to the classification prediction information of the sample image block and the classification annotation information of the class center of the class to which it belongs. For example, a feature distance between category prediction information and category information of a category center may be determined and a second category loss L may be determined based on the feature distance _ct For example, the second classification loss L may be calculated from the centrol loss function _ct . Loss of L by second classification _ct The classification network is trained, so that the intra-class feature distance of the class information of the similar sample image blocks can be reduced, the similar feature information is more concentrated in the feature space, and the class of the sample image blocks can be determined. In an example, the second classification loss L may be determined by the following equation (2) _ct ：

Wherein,class labeling for class center of class to which the i-th sample image block belongsInformation.

In one possible implementation, the classification loss may be determined jointly by multiple first and second classification losses. For example, the first classification loss and the second classification loss may be subjected to a weighted summation process to obtain the classification loss. For example, the weight ratio of the first classification loss to the second classification loss is 1:0.8, and the classification loss can be obtained after weighted summation according to the weight ratio. The present disclosure does not limit the weight ratio.

By the method, the class characteristics of the sample image blocks of the same class can be gathered in training, so that the distance between class information of sample image blocks of different classes is larger, the classification performance is improved, and the classification accuracy is improved.

In one possible implementation, the sample target feature may be segmented by a segmentation network to obtain the predicted target region in the sample image block. The prediction target region may have an error, and the segmentation loss may be determined according to the error between the prediction target region and the labeling target region of the sample image block, and further the training may be performed through the segmentation loss.

In one possible implementation, determining the segmentation loss according to the labeling information of the prediction target region and the sample image block includes: determining a first weight of the prediction target region and a second weight of a sample background region in the sample image block according to a first proportion of the number of pixels of the prediction target region in the sample image block; and determining the segmentation loss according to the first weight, the second weight, the prediction target region and the labeling information of the sample image block.

In one possible implementation, the labeling information includes labeled segmented regions, and the segmentation loss can be determined directly according to an error between the prediction target region and the labeled segmented regions. But the diameter of the nodule is usually between 5mm and 30mm, the proportion difference between the area where the nodule is located and other areas in the sample image block is larger, so that the number of pixels between the target area and the non-target area is unbalanced, the proportion of errors of the predicted target area in the segmentation loss is smaller, the optimization and adjustment of the neural network are not facilitated, the training efficiency is lower, and the training effect is poorer.

In one possible implementation, the weighting process may be based on pixels of the target region and pixels of the non-target region (i.e., the sample background region). In an example, a first weight of the prediction target region and a second weight of the sample background region in the sample image block may be determined according to a first proportion of the number of pixels of the prediction target region in the sample image block. When the segmentation loss is determined, the pixels of the two regions are weighted to balance the loss of the target region and the loss of the non-target region.

In one possible implementation, the sample image block may include a prediction target region and a background region, and the proportion of the number of pixels of the prediction target region may be counted, so as to determine the proportion of the sample background region. For example, if the first ratio of the number of pixels in the prediction target area is 0.2, the second ratio of the number of pixels in the sample background area is 0.8. The present disclosure does not limit the first ratio and the second ratio.

In one possible implementation, to balance the prediction target region and the sample background region, the second scale is determined as a first weight of the prediction target region and the first scale is determined as a second weight of the sample background region. For example, if the number of pixels in the prediction target area is 0.2, the first weight of the prediction target area is 0.8, and if the number of pixels in the sample background area is 0.8, the second weight of the sample background area is 0.2.

In one possible implementation, the segmentation loss may be determined based on the first weight, the second weight, the prediction target region, and the labeling target region of the sample image block. In an example, the segmentation loss may be determined based on the difference between the predicted target region and the target region in the labeling information, e.g., pixels in the predicted target region may be weighted with a first weight and pixels in the background region of the sample may be weighted with a second weight, and the weighted segmentation loss L may be determined _dc . For example, the segmentation loss L may be calculated from a weighted dieloss function _dc . In an example, the segmentation loss L may be determined by the following equation (3) _dc ：

Wherein y is _k ∈{0,1}，y _k When=1, the kth pixel position is the prediction target region, y _k When=1, the kth pixel position is the sample background area, P (y _k = 1|W) represents the output of the segmentation network at the kth pixel position, W represents the first weight, and Y represents the segmentation label of the kth pixel position.

In one possible implementation, the combined network loss of the shared feature extraction network, the segmentation network, and the classification network may be determined based on the classification loss and the segmentation loss. For example, the classification loss and the segmentation loss may be subjected to a weighted summation process to obtain a comprehensive network loss, which may be determined according to the following equation (4), in an example _total ：

L _total ＝θ ₁ L _sm +θ ₂ L _ct +θ ₃ L _dc (4)

Wherein θ ₁ Represents L _sm Weights of θ ₂ Represents L _ct Weights of θ ₃ Represents L _dc Weights of (e.g., θ) ₁ ＝1.2，θ ₂ ＝0.8，θ ₃ =2, the present disclosure does not limit the weights of the classification loss and the segmentation loss.

In one possible implementation, the network parameters of the neural network may be reversely adjusted by integrating network losses, for example, the network parameters may be adjusted by a gradient descent method, so that the network parameters are optimized, and the segmentation and classification accuracy is improved.

In one possible implementation, the training method may be iterated multiple times, and training is performed according to a set learning rate. In an example, in the first 20 training periods, the training may be performed using a learning rate of 0.001 x 1.1 x (where x represents the training period), and in subsequent training, the learning rate may be halved in the 40 th, 80 th and 120 th … … training periods, respectively. The training efficiency can be improved in the initial stage of training, the network parameters are greatly optimized, the learning rate is gradually reduced in the subsequent training, the network parameters are finely adjusted, the accuracy of the neural network is improved, and the accuracy of classification processing and segmentation processing is improved.

In one possible implementation, the training may be completed when the training condition is satisfied, and the trained shared feature extraction network, segmentation network, and classification network are obtained. The training conditions may include a number of training times, i.e., when a preset number of training times is reached, the training conditions are satisfied. The training condition may include that the integrated network loss is less than or equal to a preset threshold or converges to a preset interval, that is, when the integrated network loss is less than or equal to the preset threshold or converges to the preset interval, the accuracy of the neural network may be considered to meet the use requirement, and the training may be completed. The present disclosure does not limit the training conditions.

In one possible implementation, the trained neural network may be tested after training is completed. For example, the three-dimensional image block including the nodule region in the three-dimensional medical image of the lung may be input into the neural network, and the accuracy of the output segmentation result and classification result may be counted, for example, compared with the labeling information of the three-dimensional image block, to determine the accuracy of the segmentation result and classification result, and thus the training effect of the neural network may be determined. If the accuracy is higher than the preset threshold, the training effect is considered to be good, the neural network performance is good, and the method can be used for obtaining the category of the image block to be processed and dividing the target area. If the accuracy rate does not reach the preset threshold value, the training effect can be considered to be poor, and other sample image blocks can be used for continuing training.

In one possible implementation, the trained neural network may obtain the class and the target region of the image block to be processed, if the target region and the class are unknown in the image block to be processed. Only the target area in the image block to be processed may be acquired in the case where the category of the image block to be processed is known, or the category of the image block to be processed may be acquired in the case where the target area in the image block to be processed is known. The present disclosure does not limit the method of use of the neural network.

In one possible implementation, the neural network trained by the training method described above may be used in the process of determining lesion areas and lesion categories in the image block to be processed. The neural network includes a shared feature extraction network, a classification network, and a segmentation network, the method further comprising: inputting an image block to be processed into a shared feature extraction network for processing to obtain target features of the image block to be processed, wherein the shared feature extraction network comprises M shared feature extraction blocks, and the input features of the ith shared feature extraction block comprise output features of the first i-1 shared feature extraction blocks, i, M is an integer and 1 < i is less than or equal to M; inputting the target features into a classification network for classification processing to obtain the class information of the image block to be processed; and inputting the target features into a segmentation network for segmentation processing to obtain a target region in the image block to be processed.

In one possible implementation, the image block to be processed may be a partial region in the image to be processed. In an example, a partial region may be cropped from the image to be processed, e.g., a region including the target object is cropped. For example, the image to be processed is a medical image picture, and the region including the lesion may be cut out from the medical image picture. For example, the image to be processed may be a three-dimensional medical image of the lung (e.g., a lung CT image), and the image block to be processed may be a three-dimensional image block of a lesion region (e.g., a region having a nodule) cut out in the image to be processed. The present disclosure does not limit the types of images to be processed and image blocks to be processed.

In one possible implementation, in a medical image (e.g., a three-dimensional medical image of a lung), the medical image has a higher size and resolution, and in the medical image, there are more regions of normal tissue, so the medical image may be preprocessed, and the region including the lesion may be cut out for processing, so as to improve processing efficiency.

In one possible implementation, the image to be processed may be first preprocessed to improve processing efficiency. For example, preprocessing such as resampling, normalization, and the like may be performed. In an example, the three-dimensional medical image of the lung may be resampled to a resolution of 1 x 1 (i.e., each pixel representing the contents of a 1mm x 1mm cube). And the size of the resampled three-dimensional image can be cut, for example, in a lung three-dimensional medical image, partial non-lung regions possibly exist, and the region where the lung is located can be cut out, so that the calculation amount is saved, and the processing efficiency is improved.

In an example, the cropped three-dimensional image may be normalized, and pixel values of pixels in the three-dimensional image may be normalized to a value range of 0-1, so as to improve processing efficiency. After normalization processing, the first image is obtained. The present disclosure does not limit the method of pretreatment.

In one possible implementation, the target region in the first image may be detected. For example, the target region in the first image may be detected by a convolutional neural network for position detection. In an example, a convolutional neural network may be utilized to detect regions in a three-dimensional medical image of a lung that include nodules.

In one possible implementation, the target region may be cropped to obtain the image block to be processed, e.g., a region including nodules in a three-dimensional medical image of the lungs may be cropped to obtain the image block to be processed. In an example, the image block to be processed may be sized according to the size of the nodule and cropped, e.g., through statistics of the size of the nodule, the image blocks to be processed may be sized 64 x 64, and one or more image blocks to be processed may be obtained by clipping.

In one possible implementation, the neural network may determine class information of the image block to be processed, and segment the target region, for example, the image block to be processed is an image block including a nodule cut out from the three-dimensional medical image of the lung. The type of nodule (e.g., invasive prostate cancer atypical adenoma hyperplasia (Atypical adenomatous hyperplasia of preinvasive adenocarcinoma, AAH), in situ adenocarcinoma (adenocarcinoma in situ, AIS), minimally invasive adenocarcinoma (Minimally invasive adenocarcinoma, MIA) and invasive adenocarcinoma (invasive adenocarcinoma, IA)) in the image block to be processed can be determined by a neural network and the region in which the nodule is located can be segmented.

In one possible implementation, the target features of the image block to be processed may be extracted by a shared feature extraction network for classification and segmentation processing. Inputting the image block to be processed into the shared feature extraction network for processing to obtain the target feature of the image block to be processed, which can comprise: carrying out first feature extraction processing on the image block to be processed to obtain first features of the image block to be processed; inputting the first feature into a first shared feature extraction block to obtain the output feature of the first shared feature extraction block, and outputting the output feature of the first shared feature extraction block to the subsequent M-1 shared feature extraction blocks; inputting the output features of the previous j-1 shared feature extraction blocks into the jth shared feature extraction block to obtain the output features of the jth shared feature extraction block; performing second feature extraction processing on the output features of the M-th shared feature extraction layer to obtain second features of the image block to be processed; and carrying out pooling treatment on the second characteristic to obtain the target characteristic.

In one possible implementation, the first feature extraction process may be performed first, for example, by a network module including a three-dimensional convolution layer (conv 3D), a batch normalization layer (bn), and an activation layer (relu), to obtain the first feature. The present disclosure does not limit the network hierarchy in which the first feature extraction process is performed.

In one possible implementation, the shared feature extraction network may include a plurality of shared feature extraction blocks, which may include a plurality of network levels, e.g., convolutional layers, active layers, etc., and the disclosure is not limited to the network levels included by the shared feature extraction blocks. The first feature may be processed by a plurality of shared feature extraction blocks. In an example, the number of the shared feature extraction blocks is M, a first feature may be input into a first shared feature extraction block, that is, the first shared feature extraction block may take the first feature as an input feature and perform feature extraction processing on the input feature to obtain an output feature, where the output feature of the first shared feature extraction block may be shared by all subsequent shared feature extraction blocks, that is, the output feature of the first shared feature extraction block may be up to M-1 subsequent shared feature extraction blocks as the input feature of M-1 subsequent shared feature extraction blocks.

In one possible implementation manner, the input feature of the second shared feature extraction block is the output feature of the first shared feature extraction block, and after the second shared feature extraction block performs feature extraction processing on the input feature, the second shared feature extraction block can output the output feature to the following 3 rd-Mth shared feature extraction block as the input feature of the 3 rd-Mth shared feature extraction block.

In one possible implementation manner, the input feature of the 3 rd shared feature extraction block is the output feature of the first shared feature extraction block and the output feature of the second shared feature extraction block, and the output feature of the first shared feature extraction block and the output feature of the second shared feature extraction block may be input to the 3 rd shared feature extraction block after feature fusion (for example, by calculating an average value, a maximum value, or retaining all feature channels) is performed (that is, the input feature of the 3 rd shared feature extraction block may be the feature after the output feature of the first shared feature extraction block and the output feature of the second shared feature extraction block are fused), or the 3 rd shared feature extraction block may directly use the output feature of the first shared feature extraction block and the output feature of the second shared feature extraction block as the input feature (for example, the 3 rd shared feature extraction block may include a feature fusion layer, may be subjected to feature fusion processing by the hierarchy, or may retain all feature channels, and directly perform post-processing on the features of all feature channels, that is, i.e., the output feature of the first shared feature extraction block and the second shared feature extraction block may be performed directly, or the output feature extraction of all features may be performed directly (or the post-processing of the output feature of the shared feature extraction block). The output features of the 3 rd shared feature extraction block may be output to the 4 th-mth shared feature extraction block as input features of the 4 th-mth shared feature extraction block.

In one possible implementation, taking the jth (j is an integer and 1 < j < M) shared feature extraction block as an example, the output features of the first j-1 shared feature extraction blocks may be input to the jth shared feature extraction block as input features. The output features of the previous j-1 shared feature extraction blocks may be subjected to feature fusion, and the fused features are used as input features of the jth shared feature extraction block, or the output features of the previous j-1 shared feature extraction blocks may be directly used as input features of the jth shared feature extraction block (for example, fusion is performed in the jth shared feature extraction block, or the features of all feature channels are directly subjected to subsequent processing, that is, the output features of the previous j-1 shared feature extraction blocks are subjected to subsequent processing). The j-th shared feature extraction block can perform feature extraction processing on the input features of the j-th shared feature extraction block to obtain the output features of the j-th shared feature extraction block, and the output features are used as the input features of the j+1-M-th shared feature extraction block.

In one possible implementation, the mth shared feature extraction block may obtain the output feature of the mth shared feature extraction block from the output features of the first M-1 shared feature extraction blocks. And the second feature extraction process may be performed through a subsequent network hierarchy of the shared feature extraction network, for example, the output feature of the nth shared feature extraction block may be performed through a network module including a three-dimensional convolution layer (conv 3D), a batch normalization layer (bn), and an activation layer (relu), to obtain the second feature. The present disclosure does not limit the network hierarchy in which the second feature extraction process is performed.

In one possible implementation, the second feature may be pooled, e.g., the target feature may be obtained by an average pooling layer pooling the second feature. The present disclosure does not limit the type of pooling process.

In one possible implementation, the above-described processing may be performed multiple times, for example, may include multiple shared feature extraction networks. The first shared feature extraction network may take the first feature as an input feature, obtain an output feature of the first shared feature extraction network after the feature extraction process, the second feature extraction process, and the pooling process of the shared feature extraction block, and the second shared feature extraction network may take the output feature of the first shared feature extraction network as an input feature, obtain an output feature … … of the second shared feature extraction network after the feature extraction process, the second feature extraction process, and the pooling process of the shared feature extraction block, and process the output feature of the last (e.g., 4 th) shared feature extraction network as a target feature. The present disclosure does not limit the number of shared feature extraction networks.

In one possible implementation, the class information of the image block to be processed may be determined according to the target feature, for example, the image block to be processed is an image block including a lesion such as a nodule in the three-dimensional medical image of the lung, and the class of the nodule may be determined according to the target feature. In an example, the category of nodules may be determined as invasive prostate cancer atypical adenoma hyperplasia, carcinoma in situ, minimally invasive adenocarcinoma, or invasive adenocarcinoma.

In one possible implementation, the classification network may be used to classify the target feature to obtain the class information of the image block to be processed. In an example, the classification network may include a plurality of network levels, such as a convolutional layer (Conv 3D), a global average pooling layer (globalapgpool), a full connectivity layer (softmax), and the like, which may classify the target features and may output class information. The category information may be category information represented by a vector or the like, and the probability distribution of the image block to be processed represented by the vector to each category may be determined by a probability dictionary or the like, thereby determining the category information of the image block to be processed. Alternatively, the vector of the category information may directly represent the probability of the image block to be processed, and in an example, each element of the vector represents the probability of the category to which the image block to be processed belongs, for example, (0.8, 0.1) may represent that the probability of the image block to be processed belongs to the first category is 0.8, the probability of the image block to be processed belongs to the second category is 0.1, the probability of the image block to be processed belongs to the third category is 0.1, and the category with the largest probability may be determined as the category of the image block to be processed, that is, the category information of the image block to be processed may be determined as the first category. The present disclosure does not limit the method of representing the category information.

In one possible implementation, the category information of the image block to be processed may be determined according to the target feature, for example, the image block to be processed is an image block including a lesion such as a nodule in the three-dimensional medical image of the lung, and the position of the nodule may be determined according to the target feature and the region where the nodule is located may be segmented.

In one possible implementation, the segmentation process may be performed through a segmentation network, so as to obtain a target region in the image block to be processed, for example, the target region may be segmented. In an example, the split network may include multiple network levels, e.g., an upsampling layer (Upsample), a fully connected layer, etc. In an example, the target feature is a feature map obtained by performing feature extraction, pooling, and the like on an image block to be processed in the shared feature extraction network, and the resolution of the target feature may be lower than that of the image block to be processed. The up-sampling layer can be used for up-sampling, so that the number of characteristic channels of the target characteristic is reduced, the resolution is improved, and the resolution of the characteristic image output by the segmentation network is consistent with that of the image block to be processed. For example, if the shared feature extraction network performs four pooling processes, four upsampling processes may be performed by the upsampling layer to make the feature map of the output of the segmentation network coincide with the image block resolution to be processed. And the target area can be segmented in the feature map of the output of the segmentation network, for example, the target area where the nodule is located is marked by a contour line or a contour surface. The present disclosure does not limit the network hierarchy of the split network.

In one possible implementation, after the target region is segmented in the image block to be processed, the position of the target region in the image to be processed may also be determined. For example, the position of the target region in the image to be processed can be restored according to the position of the image block to be processed in the image to be processed and the position of the target region in the image block to be processed. In an example, in a medical image of the lung, the location of a nodule in a block of the image to be processed may be segmented and the location of the nodule in the medical image of the lung may be restored.

According to the neural network training method, the fine classification of the sample image blocks can be obtained, and the neural network is trained, so that the neural network can finely classify the images, the classification efficiency is improved, and the medical diagnosis accuracy is improved. The target feature can be obtained through the shared feature extraction network, the shared feature extraction block of the shared feature extraction network can obtain the output features of all previous shared feature extraction blocks, and the output features of the shared feature extraction block are input into all subsequent shared feature extraction blocks, so that gradient flow in the network is enhanced, gradient disappearance is relieved, feature extraction and learning capacity is improved, and finer classification and segmentation processing of input image blocks to be processed are facilitated. Finer category information and target areas of the image blocks to be processed can be obtained, and image processing efficiency is improved. The class information of the sample image blocks of the same class can be gathered in training, so that the feature distance between the class information of the sample image blocks of different classes is larger, and furthermore, the errors of the target area and the errors of the non-target area can be balanced, thereby being beneficial to improving the classification performance and the classification accuracy.

Fig. 2 is a schematic diagram illustrating an application of a neural network training method according to an embodiment of the present disclosure, as shown in fig. 2, a sample image is a medical image picture, and a sample image block is an image block including a lesion (e.g., a nodule) cut out from the medical image picture. Also, the sample image block may have category labels, e.g., the sample image block may include four categories of invasive prostate cancer atypical adenoma hyperplasia (Atypical adenomatous hyperplasia of preinvasive adenocarcinoma, AAH), in situ adenocarcinoma (adenocarcinoma in situ, AIS), minimally invasive adenocarcinoma (Minimally invasive adenocarcinoma, MIA) and invasive adenocarcinoma (invasive adenocarcinoma, IA).

In one possible implementation, the sample image blocks may be input into a neural network, the shared feature extraction network performs feature extraction on each batch of sample image blocks, obtains sample target features of the sample image blocks, obtains class prediction information of the sample image blocks through a classification network, and may determine a classification loss of the neural network through the formula (1) and the formula (2). Further, the segmentation network may obtain a prediction target region in the sample image block, and may determine a segmentation loss of the neural network according to equation (3). The split losses and the classification losses can be weighted and summed to obtain a comprehensive network loss of the neural network, and the neural network is trained by the comprehensive network loss. The trained neural network may be used to determine lesion areas and lesion categories in the image patch of the medical image.

In one possible implementation, the image to be processed may be a three-dimensional lung medical image (e.g., a lung CT image), and the image block to be processed may be a three-dimensional image block of a region of the case (e.g., a region with a nodule) cut out of the image to be processed.

In one possible implementation, the three-dimensional medical image may be resampled, three-dimensional images with the resolution of 1 multiplied by 1 are obtained, and areas where the lungs are located are cut out, so that the areas where the lungs are located can be normalized. Further, the region of the lung where the nodule is located may be detected, and according to 64X 64 sizing out includes and a plurality of image blocks to be processed in the area where the nodule is located.

In one possible implementation, the feature extraction process may be performed on a plurality of image blocks to be processed in batches, so as to obtain the target feature of the image block to be processed. For example, the first feature extraction process may be performed first, for example, by a network module including a three-dimensional convolution layer (conv 3D), a batch normalization layer (bn), and an activation layer (relu), to obtain the first feature.

In one possible implementation, the first feature may be input into a shared feature extraction network. The shared feature extraction network may include a plurality of shared feature extraction blocks. In an example, the number of shared feature extraction blocks is M, a first feature may be input to a first shared feature extraction block for processing, and the output feature of the first shared feature extraction block may go to the subsequent M-1 shared feature extraction blocks. The input feature of the second shared feature extraction block is the output feature of the first shared feature extraction block, and the second shared feature extraction block can output its output feature to the following 3 rd-mth shared feature extraction blocks. The input features of the 3 rd shared feature extraction block are the output features of the first shared feature extraction block and the output features of the second shared feature extraction block, and the output features of the 3 rd shared feature extraction block may be output to the 4 th-M th shared feature extraction block. Similarly, the output features of the first j-1 shared feature extraction block may be input to the jth shared feature extraction block, and the output features of the jth shared feature extraction block may be output to the (j+1) -mth shared feature extraction block. The mth shared feature extraction block may obtain the output feature of the mth shared feature extraction block according to the output feature of the previous M-1 shared feature extraction blocks, and perform a second feature extraction process, for example, the output feature of the nth shared feature extraction block may be subjected to a second feature extraction process through a network module including a three-dimensional convolution layer (conv 3D), a batch normalization layer (bn), and an activation layer (relu), to obtain a second feature. Further, the second feature may be subjected to pooling (e.g., average pooling (avgpool)) to obtain the target feature.

In one possible implementation, the above-described processing may be performed multiple times (e.g., 4 times), for example, may include multiple shared feature extraction networks. The target feature can be obtained through the processing of a plurality of cascaded shared feature extraction networks.

In one possible implementation, the classification network may perform classification processing on the target features to obtain class information of the image block to be processed. For example, the classification network may obtain class information of the image block to be processed through a convolution layer (Conv 3D), a global average pooling layer (globalapgpool), a full connectivity layer (softmax), and the like.

In one possible implementation, the segmentation network may segment the target feature to obtain a target region (i.e., the region in which the nodule is located). In an example, the segmentation network performs four upsampling processes through the upsampling layer so that the feature map of the output of the segmentation network coincides with the resolution of the image block to be processed, and the target region may be segmented in the feature map of the output of the segmentation network.

In one possible implementation manner, the neural network may obtain the class and the target area of the image block to be processed (for example, the area where the nodule is located may be segmented, and the class of the nodule may be obtained) when the target area and the class of the image block to be processed are unknown. It is also possible to acquire only the target region in the image block to be processed (for example, to divide out the region where the nodule is located) in the case where the class of the image block to be processed is known, or to acquire the class of the image block to be processed (for example, to determine the class of the nodule) in the case where the target region in the image block to be processed is known.

In one possible implementation manner, the image processing method can be used for segmenting and classifying case areas in medical images such as lung CT images and the like, assisting doctors in diagnosing diseases, improving clinical work efficiency and reducing missed diagnosis and misdiagnosis. The method can also be used for classifying other images and segmenting target areas, and the application field of the image processing method is not limited by the method.

It will be appreciated that the above-mentioned method embodiments of the present disclosure may be combined with each other to form a combined embodiment without departing from the principle logic, and are limited to the description of the present disclosure.

In addition, the disclosure further provides an apparatus, an electronic device, a computer readable storage medium, and a program, where the foregoing may be used to implement any one of the methods provided in the disclosure, and corresponding technical schemes and descriptions and corresponding descriptions referring to method parts are not repeated.

It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.

Fig. 3 shows a block diagram of a neural network training device, as shown in fig. 3, according to an embodiment of the present disclosure, the device comprising: the acquisition module 11 is used for acquiring the position information and the category information of the target area in the sample image; a first clipping module 12, configured to clip at least one target area according to the position information of the target area; the classifying module 13 is configured to classify the cut at least one target area according to the category information to obtain N types of sample image blocks, where N is an integer and N is equal to or greater than 1; the training module 14 is configured to input the N types of sample image blocks into a neural network for training.

In one possible implementation, the sample image is a medical image picture.

In some embodiments, a function or a module included in an apparatus provided by the embodiments of the present disclosure may be used to perform a method described in the foregoing method embodiments, and a specific implementation thereof may refer to the description of the foregoing method embodiments, which is not repeated herein for brevity

The disclosed embodiments also provide a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method. The computer readable storage medium may be a non-volatile computer readable storage medium.

The embodiment of the disclosure also provides an electronic device, which comprises: a processor; a memory for storing processor-executable instructions; wherein the processor is configured as the method described above.

Embodiments of the present disclosure also provide a computer program product comprising computer readable code which, when run on a device, causes a processor in the device to execute instructions for implementing the neural network training method provided in any of the embodiments above.

The disclosed embodiments also provide another computer program product for storing computer readable instructions that, when executed, cause a computer to perform the operations of the neural network training method provided in any of the above embodiments.

The electronic device may be provided as a terminal, server or other form of device.

Fig. 4 illustrates a block diagram of an electronic device 800, according to an embodiment of the disclosure. For example, electronic device 800 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, or the like.

Referring to fig. 4, the electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interactions between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen between the electronic device 800 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. When the electronic device 800 is in an operational mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 further includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of the electronic device 800. For example, the sensor assembly 814 may detect an on/off state of the electronic device 800, a relative positioning of the components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in position of the electronic device 800 or a component of the electronic device 800, the presence or absence of a user's contact with the electronic device 800, an orientation or acceleration/deceleration of the electronic device 800, and a change in temperature of the electronic device 800. The sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communication between the electronic device 800 and other devices, either wired or wireless. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi,2G, or 3G, or a combination thereof. In one exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 804 including computer program instructions executable by processor 820 of electronic device 800 to perform the above-described methods.

Fig. 5 illustrates a block diagram of an electronic device 1900 according to an embodiment of the disclosure. For example, electronic device 1900 may be provided as a server. Referring to FIG. 5, electronic device 1900 includes a processing component 1922 that further includes one or more processors and memory resources represented by memory 1932 for storing instructions, such as application programs, that can be executed by processing component 1922. The application programs stored in memory 1932 may include one or more modules each corresponding to a set of instructions. Further, processing component 1922 is configured to execute instructions to perform the methods described above.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate an operating system based on a memory 1932, such as Windows Server ^TM ，Mac OS X ^TM ，Unix ^TM ,Linux ^TM ，FreeBSD ^TM Or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 1932, including computer program instructions executable by processing component 1922 of electronic device 1900 to perform the methods described above.

The present disclosure may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.

The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.

Computer program instructions for performing the operations of the present disclosure can be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which can execute the computer readable program instructions.

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.

The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvement of the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A neural network training method, comprising:

acquiring position information and category information of a target area in a sample image;

cutting out at least one target area according to the position information of the target area;

Classifying the cut at least one target area according to the category information to obtain N types of sample image blocks, wherein N is an integer and is larger than or equal to 1;

inputting the N types of sample image blocks into a neural network for training;

the neural network includes a shared feature extraction network, a classification network, and a segmentation network,

the method further comprises the steps of:

inputting an image block to be processed into a shared feature extraction network for processing to obtain target features of the image block to be processed, wherein the shared feature extraction network comprises M shared feature extraction blocks, and the input features of the ith shared feature extraction block comprise output features of the first i-1 shared feature extraction blocks, i, M is an integer and 1 < i is less than or equal to M;

inputting the target features into a classification network for classification processing to obtain the class information of the image block to be processed;

and inputting the target features into a segmentation network for segmentation processing to obtain a target region in the image block to be processed.

2. The method of claim 1, wherein the sample image is a medical image picture.

3. The method of claim 2, wherein the obtaining location information and category information of the target region in the sample image comprises:

Positioning a target area on a medical image picture to obtain the position information of the target area;

acquiring a pathology picture associated with the medical image picture;

and labeling the category information of the target area on the medical image picture according to the pathological information of each target area on the pathological picture.

4. The method of claim 1, wherein inputting the N classes of sample image blocks into a neural network for training comprises:

inputting any sample image block into the neural network for processing to obtain category prediction information and a prediction target area of the sample image block;

determining a classification loss based at least on the class prediction information and the class information of the sample image block;

determining a segmentation loss according to the position information of the prediction target region and the sample image block;

training the neural network based on the classification loss and the segmentation loss.

5. The method of claim 4, wherein determining a classification loss based on the class prediction information and class information of the sample image block comprises:

determining a first classification loss according to the classification prediction information and the classification information of the sample image block;

Determining a second classification loss according to the classification prediction information and the classification information of the class center of the class to which the sample image block belongs;

and carrying out weighted summation processing on the first classification loss and the second classification loss to obtain the classification loss.

6. The method of claim 4, wherein determining a segmentation loss based on the location information of the prediction target region and the sample image block comprises:

determining a first weight of the prediction target region and a second weight of a sample background region in the sample image block according to a first proportion of the number of pixels of the prediction target region in the sample image block;

and determining the segmentation loss according to the first weight, the second weight, the prediction target region and the position information of the sample image block.

7. The method of claim 6, wherein determining the first weight of the prediction target region and the second weight of the sample background region in the sample image block based on a first proportion of the number of pixels of the prediction target region in the sample image block comprises:

determining a second proportion of a sample background area in the sample image block according to a first proportion of the number of pixels of the prediction target area in the sample image block;

The second ratio is determined as the first weight and the first ratio is determined as the second weight.

8. The method according to any one of claims 1-7, wherein the categories are comprising: adenocarcinomas are classified into invasive prostate cancer atypical adenoma hyperplasia nodules, in situ adenocarcinoma nodules, minimally invasive adenocarcinoma nodules and invasive adenocarcinoma nodules.

9. The method according to claim 1, wherein inputting the image block to be processed into the shared feature extraction network for processing, obtaining the target feature of the image block to be processed, comprises:

carrying out first feature extraction processing on the image block to be processed to obtain first features of the image block to be processed;

inputting the first feature into a first shared feature extraction block to obtain the output feature of the first shared feature extraction block, and outputting the output feature of the first shared feature extraction block to the subsequent M-1 shared feature extraction blocks;

inputting the output features of the previous j-1 shared feature extraction blocks into the jth shared feature extraction block to obtain the output features of the jth shared feature extraction block, wherein j is an integer and 1 < j < M;

performing second feature extraction processing on the output features of the M-th shared feature extraction layer to obtain second features of the image block to be processed;

And carrying out pooling treatment on the second characteristic to obtain the target characteristic.

10. The method according to claim 1, wherein the method further comprises:

preprocessing an image to be processed to obtain a first image;

positioning a target area on a first image, and determining the position information of the target area in the first image;

and cutting out at least one image block to be processed according to the position information of the target area.

11. A neural network training device, comprising:

the acquisition module is used for acquiring the position information and the category information of the target area in the sample image;

the first cutting module is used for cutting out at least one target area according to the position information of the target area;

the classifying module is used for classifying the cut at least one target area according to the category information to obtain N types of sample image blocks, wherein N is an integer and N is equal to or greater than 1;

the training module is used for inputting the N types of sample image blocks into a neural network for training;

the neural network includes a shared feature extraction network, a classification network, and a segmentation network, the apparatus further comprising:

The image processing device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for inputting an image block to be processed into a shared feature extraction network to be processed to acquire target features of the image block to be processed, the shared feature extraction network comprises M shared feature extraction blocks, the input features of the ith shared feature extraction block comprise output features of the first i-1 shared feature extraction blocks, i and M are integers, and i is more than 1 and less than or equal to M;

the classification module is used for inputting the target characteristics into a classification network for classification processing to obtain the category information of the image block to be processed;

and the segmentation module is used for inputting the target characteristics into a segmentation network for segmentation processing to obtain a target area in the image block to be processed.

12. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to: performing the method of any one of claims 1 to 10.

13. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1 to 10.