CN113628216A

CN113628216A - Model training method, image segmentation method, device and related products

Info

Publication number: CN113628216A
Application number: CN202110918323.9A
Authority: CN
Inventors: 尚方信; 王思其; 杨叶辉; 黄海峰; 王磊
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-08-11
Filing date: 2021-08-11
Publication date: 2021-11-09

Abstract

The disclosure provides a model training method, an image segmentation device and related products, relates to the field of image processing, and particularly relates to a deep learning technology and an image segmentation technology. The specific implementation scheme is as follows: acquiring a pre-constructed initial convolutional neural network model, wherein the initial convolutional neural network model comprises an initial encoder and an initial decoder which are sequentially connected, the initial encoder comprises a preset two-dimensional coding network structure and a preset three-dimensional coding network structure which are sequentially connected, and the initial decoder comprises a preset three-dimensional decoding network structure and a preset two-dimensional decoding network structure which are sequentially connected; obtaining a training sample for training the initial convolutional neural network model, training the initial convolutional neural network model by adopting the training sample to obtain a convolutional neural network model trained to be convergent, and carrying out image segmentation processing on a target three-dimensional image to be segmented by the convolutional neural network model trained to be convergent.

Description

Model training method, image segmentation method, device and related products

Technical Field

The present disclosure relates to a deep learning technique and an image segmentation technique in the field of image processing, and in particular, to a model training method, an image segmentation apparatus, and related products.

Background

With the continuous development of medical imaging technology and computer technology, medical image analysis technology has also made continuous progress. Deep learning models, especially convolutional neural network models, have been widely used in medical image analysis techniques.

The convolutional neural network model can be further divided into a two-dimensional convolutional neural network model (for short, 2D CNN) and a three-dimensional convolutional neural network model (for short, 3D CNN) according to different image modalities to be processed. When a three-dimensional image is segmented, in order to integrate spatial information in the three-dimensional image and obtain a better segmentation result, the image is generally segmented by using 3D CNN.

Disclosure of Invention

The disclosure provides a model training method for image segmentation, an image segmentation method, an image segmentation device and a related product.

According to a first aspect of the present disclosure, there is provided a training method of a convolutional neural network model for image segmentation, comprising:

acquiring a pre-constructed initial convolutional neural network model, wherein the initial convolutional neural network model comprises an initial encoder and an initial decoder which are sequentially connected, the initial encoder comprises a preset two-dimensional coding network structure and a preset three-dimensional coding network structure which are sequentially connected, and the initial decoder comprises a preset three-dimensional decoding network structure and a preset two-dimensional decoding network structure which are sequentially connected;

acquiring training samples for training the initial convolutional neural network model, wherein the training samples are a plurality of two-dimensional slice image samples which are continuous along a preset space direction and corresponding two-dimensional segmentation labeling image samples;

and training the initial convolutional neural network model by adopting the training sample to obtain a convolutional neural network model trained to be convergent, wherein the convolutional neural network model trained to be convergent is used for carrying out image segmentation processing on a target three-dimensional image to be segmented.

According to a second aspect of the present disclosure, there is provided an image segmentation method based on a convolutional neural network model, including:

acquiring a target three-dimensional image;

dividing the target three-dimensional image along a preset space direction to form a plurality of target two-dimensional slice images;

carrying out image segmentation processing on a plurality of target two-dimensional slice images by adopting a convolutional neural network model trained to be convergent; the convolutional neural network model from training to convergence comprises a target encoder and a target decoder which are connected in sequence, the target encoder comprises a target two-dimensional coding network structure and a target three-dimensional coding network structure which are connected in sequence, and the target decoder comprises a target three-dimensional decoding network structure and a target two-dimensional decoding network structure which are connected in sequence.

According to a third aspect of the present disclosure, there is provided a training apparatus for a convolutional neural network model for image segmentation, comprising:

the system comprises a model acquisition unit, a model selection unit and a model selection unit, wherein the model acquisition unit is used for acquiring a pre-constructed initial convolutional neural network model, the initial convolutional neural network model comprises an initial encoder and an initial decoder which are sequentially connected, the initial encoder comprises a preset two-dimensional coding network structure and a preset three-dimensional coding network structure which are sequentially connected, and the initial decoder comprises a preset three-dimensional decoding network structure and a preset two-dimensional decoding network structure which are sequentially connected;

a sample obtaining unit, configured to obtain a training sample for training the initial convolutional neural network model, where the training sample is a plurality of two-dimensional slice image samples that are continuous along a preset spatial direction and corresponding two-dimensional segmentation labeling image samples;

and the model training unit is used for training the initial convolutional neural network model by adopting the training sample so as to obtain a convolutional neural network model trained to be convergent, and the convolutional neural network model trained to be convergent is used for carrying out image segmentation processing on a target three-dimensional image to be segmented.

According to a fourth aspect of the present disclosure, there is provided an image segmentation apparatus based on a convolutional neural network model, including:

an image acquisition unit for acquiring a target three-dimensional image;

the image dividing unit is used for dividing the target three-dimensional image along a preset space direction to form a plurality of target two-dimensional slice images;

the image segmentation unit is used for carrying out image segmentation processing on the plurality of target two-dimensional slice images by adopting a convolutional neural network model trained to be convergent; the convolutional neural network model from training to convergence comprises a target encoder and a target decoder which are connected in sequence, the target encoder comprises a target two-dimensional coding network structure and a target three-dimensional coding network structure which are connected in sequence, and the target decoder comprises a target three-dimensional decoding network structure and a target two-dimensional decoding network structure which are connected in sequence.

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first or second aspect.

According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the first or second aspect.

According to a seventh aspect of the present disclosure, there is provided a computer program product comprising: a computer program, stored in a readable storage medium, from which at least one processor of an electronic device can read the computer program, execution of the computer program by the at least one processor causing the electronic device to perform the method of the first or second aspect.

When the initial convolutional neural network model is trained according to the technology disclosed by the invention, the convolutional neural network model trained to be converged is the optimal convolutional neural network model, and then when the optimal convolutional neural network model is adopted to segment the image, the segmentation result is more accurate.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a network architecture diagram of a training method for a convolutional neural network model for image segmentation provided in accordance with the present disclosure;

FIG. 2 is a schematic diagram of a network architecture for a convolutional neural network model-based image segmentation method provided in accordance with the present disclosure;

FIG. 3 is a schematic flow chart diagram of a training method of a convolutional neural network model for image segmentation provided according to a first embodiment of the present disclosure;

FIG. 4 is a schematic flow chart diagram of a training method of a convolutional neural network model for image segmentation provided according to a second embodiment of the present disclosure;

FIG. 5 is a first schematic diagram of a convolutional neural network model for image segmentation provided in accordance with an embodiment of the present disclosure;

FIG. 6 is a second schematic diagram of a convolutional neural network model for image segmentation provided in accordance with an embodiment of the present disclosure;

FIG. 7 is a schematic flow chart diagram of a training method of a convolutional neural network model for image segmentation provided in accordance with a third embodiment of the present disclosure;

FIG. 8 is a schematic flow chart diagram of a training method of a convolutional neural network model for image segmentation provided in accordance with a fourth embodiment of the present disclosure;

fig. 9 is a schematic flowchart of an image segmentation method based on a convolutional neural network model according to a fifth embodiment of the present disclosure;

fig. 10 is a schematic flowchart of an image segmentation method based on a convolutional neural network model according to a sixth embodiment of the present disclosure;

fig. 11 is a schematic flowchart of an image segmentation method based on a convolutional neural network model according to a seventh embodiment of the present disclosure;

FIG. 12 is a schematic diagram illustrating an image segmentation method based on a convolutional neural network model according to an embodiment of the present disclosure;

fig. 13 is a schematic structural diagram of a training apparatus for a convolutional neural network model for image segmentation provided in accordance with an eighth embodiment of the present disclosure;

fig. 14 is a schematic structural diagram of an image segmentation apparatus based on a convolutional neural network model according to a ninth embodiment of the present disclosure;

FIG. 15 is a block diagram of an electronic device for implementing a convolutional neural network model training method for image segmentation and an image segmentation method based on a convolutional neural network model according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the technical solution of the present disclosure, the acquisition, storage, and application of the images and the image samples are all in accordance with the regulations of the related laws and regulations, and do not violate the customs of the public order.

For clear understanding of the technical solutions of the present disclosure, the technical solutions of the prior art will be described in detail first.

In the prior art, when a three-dimensional image is segmented, in order to integrate spatial information in the three-dimensional image and obtain a better segmentation result, a 3D convolutional neural network model is generally used for segmenting the image, but the computation load of a model training process and an image segmentation process is increased, so that the efficiency of the model training process and the image segmentation process is low.

Therefore, in order to retain spatial information in a three-dimensional image and reduce the computation load in a model training process and an image segmentation process, a 2.5D CNN combining a 2D CNN and a 3D CNN is proposed at present. Specifically, in the process of training the 2.5D CNN, the 2D CNN is trained by using two-dimensional slice image samples, after the 2D CNN is trained to converge, the segmentation results of the two-dimensional slice image samples by the 2D CNN are stacked into a three-dimensional segmentation result, and then the three-dimensional segmentation result and the three-dimensional image sample are input into the 3D CNN together to train the 3D CNN. Or the three-dimensional segmentation result and the three-dimensional image sample are fused and then input into the 3D CNN to train the 3D CNN. The current 2.5D CNN is essentially a combined two-stage model of 2D CNN and 3D CNN. The training process for 2.5D CNN cannot adjust the training parameters in 2D CNN and 3D CNN synchronously. Therefore, the quality of the 2D CNN training result directly affects the 3D CNN training result, so the 2.5D CNN training result cannot achieve the optimal result in a true sense, which brings a potential loss of training performance, resulting in a poor image segmentation result.

Therefore, in order to solve the technical problems in the prior art, the inventors found through creative research that in order to retain spatial information in a three-dimensional image and reduce the computation amount in a model training process and an image segmentation process, it is still necessary to fuse 2D CNN and 3D CNN, but the 2D CNN and 3D CNN are not simply spliced back and forth, and the training process is independent. But the coding function and the decoding function of the 2D CNN and the 3D CNN are integrated to form an integrated 2.5D CNN. In order to train the 2.5D CNN, the 2.5D CNN is trained as a whole. When the 2D CNN and the 3D CNN are integrated, since a two-dimensional encoder and a two-dimensional decoder are included in the 2D CNN. A three-dimensional encoder and a three-dimensional decoder are included in the 3D CNN. Therefore, in order to enable the 2.5D CNN to have a two-dimensional coding function, a two-dimensional decoding function and a three-dimensional coding function and a three-dimensional decoding function, the constructed 2.5D CNN comprises an initial encoder and an initial decoder, the initial encoder comprises a preset two-dimensional coding network structure and a preset three-dimensional coding network structure which are sequentially connected, and the initial decoder comprises a preset three-dimensional decoding network structure and a preset two-dimensional decoding network structure which are sequentially connected. When the 2.5D CNN is trained, the 2.5D CNN constructed in advance is obtained. The 2.5D CNN can be called as an initial convolutional neural network model, training samples for training the initial convolutional neural network model are obtained, the training samples are a plurality of two-dimensional slice image samples which are continuous along a preset space direction and corresponding two-dimensional segmentation label image samples, the initial convolutional neural network model is trained by the training samples to obtain a convolutional neural network model which is trained to be convergent, and the convolutional neural network model which is trained to be convergent is used for carrying out image segmentation processing on a target three-dimensional image to be segmented.

The initial convolutional neural network model has a two-dimensional coding network structure, a three-dimensional coding network structure, a preset three-dimensional decoding network structure and a preset two-dimensional decoding network structure, so that when the initial convolutional neural network model is trained, the spatial information in a three-dimensional image can be reserved, and the computation amount in the model training process is reduced. And when the convolution neural network model trained to be convergent is adopted to segment the image, the operation amount in the image segmentation process can be effectively reduced. And the initial convolutional neural network model is an integral convolutional model comprising an initial encoder and an initial decoder, so that when the initial convolutional neural network model is trained, the initial convolutional neural network model is used as an integral to be trained to adjust training parameters in the model, so that the performance of the convolutional neural network model trained to be converged can be optimized, the optimal convolutional neural network model is obtained, and when the optimal convolutional neural network model is adopted to segment images, the segmentation result is more accurate.

The inventor proposes a technical scheme of the present disclosure based on the above-mentioned creative discovery. The following describes a network architecture and an application scenario of a training method of a convolutional neural network model for image segmentation, and an image segmentation method based on the convolutional neural network model provided in an embodiment of the present disclosure.

Fig. 1 is a schematic diagram of a network architecture of a training method of a convolutional neural network model for image segmentation provided according to the present disclosure, as shown in fig. 1, the network architecture includes: an image storage device 1 and an electronic device 2. The image storage device 1 stores training samples for training the initial convolutional neural network model. A pre-constructed initial convolutional neural network model is stored in the electronic device 2. The electronic device 2 obtains a training sample by communicating with the image storage device 1, and then trains the initial convolutional neural network model by using the method of the embodiment of the present disclosure to obtain a convolutional neural network model trained to be convergent.

Fig. 2 is a schematic diagram of a network architecture of an image segmentation method based on a convolutional neural network model according to the present disclosure, as shown in fig. 2, the network architecture includes: an image storage device 1, an electronic device 2, and an electronic device 3. Wherein the image storage device 1 also stores therein a target three-dimensional image to be segmented. In the electronic device 2, there are included: training to a converged convolutional neural network model. The electronic device 3 acquires a target three-dimensional image by communicating with the image storage device 1. The electronic device 3 acquires the convolutional neural network model trained to converge by communicating with the electronic device 2. The electronic device 3 segments the target image by the image segmentation method based on the convolutional neural network model provided by the embodiment of the present disclosure.

It can be understood that, in the embodiment of the present disclosure, the electronic device 2 may also be used to acquire the target three-dimensional image from the image storage device 1, and the convolutional neural network model stored in the electronic device 2 and trained to converge is used to segment the target three-dimensional image, which is not limited in the embodiment of the present disclosure.

The training method of the convolutional neural network model for image segmentation and the application scene of the image segmentation method based on the convolutional neural network model provided by the embodiment of the disclosure can be a scene for segmenting a medical three-dimensional image. The medical three-dimensional image may be a CT image, an MRI image, or the like. Wherein the third dimension of the medical three-dimensional image is a depth space dimension. The image can be segmented into organs, focuses and the like in the medical three-dimensional image. The application scene may also be a scene that segments an image in a video. The scene can also be a scene in which a plurality of continuous shooting images are segmented. If the target video is a scene in which images in the video are segmented and a plurality of continuous shooting images are segmented, the third dimension of the target video and the plurality of continuous shooting images can be a space dimension formed by splicing time. The segmentation of the image may be performed for different objects in the image.

Embodiments of the present disclosure will be described below in detail with reference to the accompanying drawings.

Fig. 3 is a flowchart illustrating a training method of a convolutional neural network model for image segmentation according to a first embodiment of the present disclosure. As shown in fig. 3, the main implementation of the training method for the convolutional neural network model for image segmentation provided by the present embodiment is a training device for the convolutional neural network model for image segmentation, which is located in an electronic device. The training method of the convolutional neural network model for image segmentation provided by the embodiment includes the following steps:

step 301, obtaining a pre-constructed initial convolutional neural network model, where the initial convolutional neural network model includes an initial encoder and an initial decoder that are connected in sequence, the initial encoder includes a preset two-dimensional coding network structure and a preset three-dimensional coding network structure that are connected in sequence, and the initial decoder includes a preset three-dimensional decoding network structure and a preset two-dimensional decoding network structure that are connected in sequence.

Wherein the pre-constructed initial convolutional neural network model is constructed by using 2D CNN and 3D CNN. It may also be referred to as the initial 2.5D CNN.

In the initial convolutional neural network model, an initial encoder and an initial decoder are included. The training samples are input into the initial convolutional neural network model, and are firstly encoded by an initial encoder and then decoded by an initial decoder.

In this embodiment, the initial encoder includes at least one layer of a predetermined two-dimensional encoding network structure and at least one layer of a predetermined three-dimensional encoding network structure. And the at least one layer of preset three-dimensional coding network structure is connected to the rear side of the at least one layer of preset two-dimensional coding network structure.

Wherein, the preset two-dimensional coding network structure comprises: the system comprises at least one preset two-dimensional convolution operator, a preset two-dimensional normalization operator, a preset nonlinear change operator and a down-sampling operator. The preset three-dimensional coding network structure comprises: at least one preset three-dimensional convolution operator, a preset three-dimensional normalization operator, a preset nonlinear change operator and a down-sampling operator.

In this embodiment, the initial decoder has at least one layer of a predetermined three-dimensional decoding network structure and at least one layer of a predetermined two-dimensional decoding network structure. And the at least one layer of preset two-dimensional decoding network structure is connected to the rear side of the at least one layer of preset three-dimensional decoding network structure. The first layer of the preset three-dimensional decoding network structure is connected with the last layer of the preset three-dimensional coding network structure.

Wherein, presetting the two-dimensional decoding network structure includes: the system comprises at least one preset two-dimensional convolution operator, a preset two-dimensional normalization operator, a preset nonlinear change operator and an up-sampling operator. The preset three-dimensional decoding network structure comprises: the system comprises at least one preset three-dimensional convolution operator, a preset three-dimensional normalization operator, a preset nonlinear change operator and an up-sampling operator.

Step 302, obtaining training samples for training the initial convolutional neural network model, where the training samples are a plurality of two-dimensional slice image samples which are continuous along a preset spatial direction and corresponding two-dimensional segmentation labeling image samples.

In this embodiment, the electronic device interacts with the image storage device to obtain a three-dimensional image sample and a corresponding three-dimensional segmentation annotation image sample. And dividing the three-dimensional image sample and the corresponding three-dimensional segmentation annotation image sample along a preset space direction respectively to form a two-dimensional slice image sample and a corresponding two-dimensional segmentation annotation image sample which are continuous along the preset space direction.

Illustratively, the shape of a three-dimensional image sample and the shape of a three-dimensional segmentation annotation image sample are the same and are both 1 × Cⁱⁿ×Dⁱⁿ×Hⁱⁿ×Wⁱⁿ. Wherein, CⁱⁿIs the number of channels of the image sample. DⁱⁿIs the depth dimension of the image sample, HⁱⁿIs the length dimension of the image sample, WⁱⁿIs the width dimension of the image sample. If the three-dimensional image sample and the corresponding three-dimensional segmentation annotation image sample are respectively divided along the depth direction, the shapes of the two-dimensional slice image sample and the corresponding two-dimensional segmentation annotation image sample are Cⁱⁿ×Hⁱⁿ×Wⁱⁿ. The number of the two-dimensional slice image samples and the corresponding two-dimensional segmentation annotation image samples as training samples is N, wherein N can be equal to Dⁱⁿ. Or N is less than Dⁱⁿ. Whether N is equal to DⁱⁿAnd whether N is less than DⁱⁿThe order of the two-dimensional slice image samples is continuous. Likewise the order of the corresponding two-dimensional segmentation annotation image samples is also continuous.

The three-dimensional image sample can be a medical three-dimensional image sample, and the preset spatial direction can be the depth D of the three-dimensional image sampleⁱⁿThe direction can also be the length H of the three-dimensional image sampleⁱⁿDirection or width WⁱⁿAnd (4) direction. Or the three-dimensional image sample can be a video or a plurality of continuous shooting images. The preset space direction can be a video or a plurality of continuous shooting images, and D is formed by a time dimensionⁱⁿDirection or length HⁱⁿDirection or width WⁱⁿAnd (4) direction.

The three-dimensional segmentation labeling image sample is an image sample formed by labeling segmentation points of the three-dimensional image sample. According to different purposes of image segmentation on the three-dimensional image sample, the segmentation points marked on the three-dimensional segmentation marked image sample are different. For example, if the purpose of segmenting the image is to segment at least one organ, the segmentation point labeled by the three-dimensional segmentation labeling image sample is a pixel point of at least one organ. The image is divided to divide at least one focus, and the division point marked by the three-dimensional division marking image sample is the pixel point of at least one focus. The image is divided to divide at least one object, and the division points marked by the three-dimensional division marking image sample are pixel points of the at least one object.

Step 303, training the initial convolutional neural network model by using the training sample to obtain a convolutional neural network model trained to be convergent, wherein the convolutional neural network model trained to be convergent is used for performing image segmentation processing on a target three-dimensional image to be segmented.

In this embodiment, a training sample is input into an initial convolutional neural network model, a training parameter in the initial convolutional neural network model is adjusted to train the initial convolutional neural network model, after the training parameter in the initial convolutional neural network model is adjusted each time, whether a preset convergence condition is satisfied is determined, if the preset convergence condition is not satisfied, the training of the initial convolutional neural network model is continued, and if the preset convergence condition is determined to be satisfied, the convolutional neural network model satisfying the preset convergence condition is determined as a convolutional neural network model trained to be converged.

When the initial convolutional neural network model is trained, an initial encoder and an initial decoder in the initial convolutional neural network model are trained as a whole. Therefore, when the training parameters in the initial convolutional neural network model are adjusted, the training parameters in the initial encoder and the initial decoder can be adjusted simultaneously or only the training parameters in the initial encoder or the initial decoder can be adjusted according to the convergence direction of the model. Which is determined by the direction of convergence of the model.

The training method of the convolutional neural network model for image segmentation provided by this embodiment includes obtaining a pre-constructed initial convolutional neural network model, where the initial convolutional neural network model includes an initial encoder and an initial decoder that are sequentially connected, the initial encoder includes a preset two-dimensional coding network structure and a preset three-dimensional coding network structure that are sequentially connected, and the initial decoder includes a preset three-dimensional decoding network structure and a preset two-dimensional decoding network structure that are sequentially connected; acquiring training samples for training an initial convolutional neural network model, wherein the training samples are a plurality of two-dimensional slice image samples which are continuous along a preset spatial direction and corresponding two-dimensional segmentation labeling image samples; and training the initial convolutional neural network model by adopting a training sample to obtain a convolutional neural network model trained to be convergent, wherein the convolutional neural network model trained to be convergent is used for carrying out image segmentation processing on a target three-dimensional image to be segmented. The initial convolutional neural network model has a two-dimensional coding network structure, a three-dimensional coding network structure, a preset three-dimensional decoding network structure and a preset two-dimensional decoding network structure, so that when the initial convolutional neural network model is trained, the spatial information in a three-dimensional image can be reserved, and the computation amount in the model training process is reduced. And when the convolution neural network model trained to be convergent is adopted to segment the image, the operation amount in the image segmentation process can be effectively reduced. And the initial convolutional neural network model is an integral convolutional model comprising an initial encoder and an initial decoder, so that when the initial convolutional neural network model is trained, the initial convolutional neural network model is used as an integral to be trained to adjust training parameters in the model, so that the performance of the convolutional neural network model which is converged during training can be optimized, the optimal convolutional neural network model is obtained, and when the optimal convolutional neural network model is adopted to segment images, the segmentation result is more accurate.

Fig. 4 is a schematic flowchart of a training method of a convolutional neural network model for image segmentation according to a second embodiment of the present disclosure, and as shown in fig. 4, the training method of a convolutional neural network model for image segmentation provided in this embodiment further includes, before step 301, step 300, where the scheme of constructing an initial convolutional neural network model using a preset two-dimensional convolutional neural network model and a preset three-dimensional convolutional neural network model is further included, and then step 300 provided in this embodiment includes the following steps:

step 3001, obtaining a preset two-dimensional encoder and a preset two-dimensional decoder in a preset two-dimensional convolutional neural network model and a preset three-dimensional encoder and a preset three-dimensional decoder in a preset three-dimensional convolutional neural network model.

In this embodiment, a preset two-dimensional convolutional neural network model and a preset three-dimensional convolutional neural network model are stored in the electronic device in advance. Therefore, a preset two-dimensional encoder and a preset two-dimensional decoder in the preset two-dimensional convolutional neural network model and a preset three-dimensional encoder and a preset three-dimensional decoder in the preset three-dimensional convolutional neural network model are obtained from a preset storage area of the electronic device.

The preset two-dimensional encoder comprises a multilayer preset two-dimensional encoding network structure. The multilayer preset two-dimensional coding network structures are sequentially connected in series, and the preset two-dimensional decoder comprises the multilayer preset two-dimensional decoding network structures. The multilayer preset two-dimensional decoding network structures are sequentially connected in series. And the last layer of the preset two-dimensional coding network structure is connected with the first layer of the preset decoding network structure.

The preset three-dimensional encoder comprises a multilayer preset three-dimensional encoding network structure. The multilayer preset three-dimensional coding network structures are sequentially connected in series, and the preset three-dimensional decoder comprises a multilayer preset three-decoding network structure. The multilayer preset three-dimensional decoding network structures are sequentially connected in series. And the last layer of the preset three-dimensional coding network structure is connected with the first layer of the preset decoding network structure.

Step 3002, replacing at least one layer of the preset two-dimensional coding network structure at the end of the preset two-dimensional encoder with a preset three-dimensional coding network structure in the preset three-dimensional encoder.

In this embodiment, after at least one layer of the preset two-dimensional coding network structure at the end of the preset two-dimensional encoder is replaced with the preset three-dimensional coding network structure in the preset three-dimensional encoder, the last layer of the preset two-dimensional coding network structure existing in the preset two-dimensional encoder is connected with the first layer of the preset three-dimensional coding network structure. As shown in fig. 5, the original preset two-dimensional encoder includes four layers of preset two-dimensional encoding network structures, and after the last two layers of preset two-dimensional encoding network structures at the tail end of the preset two-dimensional encoder are replaced with corresponding two layers of preset three-dimensional encoding network structures, the second layer of preset two-dimensional encoding network structures is connected with the first layer of preset three-dimensional encoding network structures.

It should be noted that, the last layer of the preset two-dimensional coding network structure or the first layer of the preset three-dimensional coding network structure may further include a dimension conversion operator, where the dimension conversion operator is configured to convert the two-dimensional down-sampling feature maps of the multiple samples output by the last layer of the two-dimensional coding network structure into three-dimensional down-sampling feature maps of the samples. To adapt to the shape of the image that the preset three-dimensional coding network structure can process.

Step 3003, replacing at least one layer of the preset two-dimensional decoding network structure at the front end of the preset two-dimensional decoder with a preset three-dimensional decoding network structure in the preset three-dimensional decoder.

In this embodiment, after at least one layer of the preset two-dimensional decoding network structure at the front end of the preset two-dimensional decoder is replaced with the preset three-dimensional decoding network structure in the preset three-dimensional decoder, the first layer of the preset three-dimensional decoding network structure is connected with the last layer of the preset three-dimensional encoding network structure, and the last layer of the preset three-dimensional decoding network structure is connected with the first layer of the preset two-dimensional decoding network structure. As shown in fig. 5, the first layer of the predetermined three-dimensional decoding network structure is connected to the second layer of the predetermined three-dimensional encoding network structure. The second layer of the preset three-dimensional decoding network structure is connected with the first layer of the preset two-dimensional decoding network structure.

Similarly, in this embodiment, the last layer of the preset three-dimensional decoding network structure may further include a dimension conversion operator, where the dimension conversion operator is configured to convert the sample three-dimensional upsampling feature map output by the last layer of the preset three-dimensional decoding network structure into a plurality of sample three-dimensional transformation upsampling feature maps. To adapt to the shape of the image that the preset two-dimensional decoding network structure can process.

In the training method of the convolutional neural network model for image segmentation provided by this embodiment, before the pre-constructed initial convolutional neural network model is obtained, the initial convolutional neural network model is constructed by using the preset two-dimensional convolutional neural network model and the preset three-dimensional convolutional neural network model. Specifically, a preset two-dimensional encoder and a preset two-dimensional decoder in a preset two-dimensional convolutional neural network model and a preset three-dimensional encoder and a preset three-dimensional decoder in a preset three-dimensional convolutional neural network model are obtained; at least one layer of preset two-dimensional coding network structure at the tail end in a preset two-dimensional encoder is replaced by a preset three-dimensional coding network structure in a preset three-dimensional encoder; the 2.5D initial convolutional neural network model can be constructed in a mode that at least one layer of the preset two-dimensional coding network structure and the preset two-dimensional decoding network structure in the preset two-dimensional convolutional neural network model are replaced by the preset three-dimensional decoding network structure in the preset three-dimensional decoder, the 2.5D initial convolutional neural network model can be constructed rapidly, the initial convolutional neural network model is an integral 2.5D convolutional neural network model, and the requirement for integral training of the initial convolutional neural network model is met.

As an alternative, in this embodiment, the initial convolutional neural network model is constructed as any U-shaped model, such as UNet, UNet + +, U2 Net. Therefore, the layer number of the preset two-dimensional coding network structure and the layer number of the preset two-dimensional decoding network structure in the initial convolutional neural network model are the same, and the layer number of the preset three-dimensional coding network structure and the layer number of the preset three-dimensional decoding network structure are the same. After the convolution neural network model trained to be convergent is adopted to segment the target three-dimensional image, the size of a segmented result image is the same as that of a target two-dimensional slice image segmented by the target three-dimensional image.

As an optional implementation manner, in this embodiment, the preset two-dimensional coding network structure of the symmetric hierarchy is connected to the preset two-dimensional decoding network structure, and the preset three-dimensional coding network structure of the symmetric hierarchy is connected to the preset three-dimensional decoding network structure.

Specifically, in this embodiment, in order to ensure that the detail information in the feature map output in each layer of the preset coding network structure is retained during the segmentation, the segmentation effect is better. In the preset two-dimensional convolutional neural network model and the preset three-dimensional convolutional neural network model, the symmetrical-level preset coding network structure is connected with the preset two-dimensional decoding network structure, so that the preset two-dimensional decoding network structure not only decodes the feature map which is output by the previous decoding network structure and retains coarse-grained information, but also decodes the feature map in the symmetrical-level preset coding network structure, and the decoded feature map retains not only coarse-grained information but also detailed information.

As shown in fig. 5 and 6, the first layer of the preset two-dimensional coding network structure is symmetrical to the second layer of the preset two-dimensional decoding network structure, the second layer of the preset two-dimensional coding network structure is symmetrical to the first layer of the preset two-dimensional decoding network structure, the first layer of the preset three-dimensional coding network structure is symmetrical to the second layer of the preset three-dimensional decoding network structure, and the second layer of the preset three-dimensional coding network structure is symmetrical to the first layer of the preset three-dimensional decoding network structure.

Therefore, as shown in fig. 6, in constructing the initial convolutional neural network model from the preset two-dimensional convolutional neural network model and the preset three-dimensional convolutional neural network model, the preset two-dimensional coding network structure of the symmetrical hierarchy is also connected with the preset two-dimensional decoding network structure, and the preset three-dimensional coding network structure of the symmetrical hierarchy is connected with the preset three-dimensional decoding network structure. The preset two-dimensional decoding network structure decodes not only the feature map output by the previous decoding network structure and retaining the coarse-grained information, but also the feature map in the symmetrical hierarchy of the preset two-dimensional coding network structure. Therefore, the characteristic diagram decoded by adopting the preset two-dimensional decoding network structure and the preset three-dimensional decoding network structure not only retains coarse granularity information, but also retains detail information. The segmentation result is more accurate.

In this embodiment, in order to make the corresponding functions of each layer of the preset two-dimensional coding network structure, each layer of the preset three-dimensional decoding network structure, and each layer of the preset two-dimensional decoding network structure the same, a more reasonable initial convolutional neural network is formed, and the dimension conversion operator can be separated from the network structure.

Specifically, as shown in fig. 6, a first feature map dimension conversion layer is added between a last layer of preset two-dimensional coding network structure and a first layer of preset three-dimensional coding network structure in the initial convolutional neural network model, and the first feature map dimension conversion layer is used for converting a plurality of sample two-dimensional down-sampling feature maps output by the last layer of preset two-dimensional coding network structure into sample three-dimensional down-sampling feature maps; and adding a second feature map dimension conversion layer between the last layer of preset three-dimensional decoding network structure and the first layer of preset two-dimensional decoding network structure in the initial convolutional neural network model, wherein the second feature map dimension conversion layer is used for converting the sample three-dimensional up-sampling feature map output by the last layer of preset three-dimensional decoding network structure into a plurality of sample two-dimensional up-sampling feature maps.

And the first feature map dimension conversion layer comprises a first dimension conversion operator. And the first dimension conversion operator converts the two-dimensional down-sampling feature maps of the plurality of samples into three-dimensional conversion down-sampling feature maps of the samples. Illustratively, the two-dimensional down-sampled feature map of each sample has a shape of C × H × W. H. H with W having a size corresponding to a two-dimensional slice image sampleⁱⁿ、WⁱⁿAre all different. And if the number of the two-dimensional slice image samples is N. The number of sample two-dimensional down-sampled feature maps is also N. After a plurality of sample two-dimensional up-sampling feature maps are converted into sample three-dimensional conversion down-sampling feature maps, the shapes of the sample three-dimensional conversion down-sampling feature maps are as follows: 1 XCXDHXW. Wherein D ═ N. Namely, the first dimension conversion operator in the first feature map dimension conversion layer converts the dimension of the number of sample two-dimensional down-sampling feature maps into the depth space dimension, and adds a dimension with the number of 1.

And the second dimension conversion operator is included in the second feature diagram dimension conversion layer. And the second dimension conversion operator converts the sample three-dimensional up-sampling feature map into a plurality of sample two-dimensional conversion up-sampling feature maps. Illustratively, the size of the sample three-dimensional up-sampled feature map is 1 × C × D × H × W. After the feature map is transformed into a plurality of sample two-dimensional transformation up-sampling feature maps, the size of each sample two-dimensional transformation up-sampling feature map is C multiplied by H multiplied by W. And D is the number N of the sampling characteristic graphs on the sample two-dimensional transformation.

Fig. 7 is a schematic flowchart of a training method of a convolutional neural network model for image segmentation according to a third embodiment of the present disclosure, and as shown in fig. 7, the image segmentation method based on a convolutional neural network model provided in this embodiment further refines step 302 on the basis of the image segmentation method based on a convolutional neural network model provided in any of the above embodiments, then the image segmentation method based on a convolutional neural network model provided in this embodiment includes the following steps:

and step 3021, acquiring a three-dimensional image sample and a corresponding three-dimensional segmentation annotation image sample.

In this embodiment, a three-dimensional image sample is acquired by an image acquisition device, and is provided for a user to perform segmentation and labeling on the three-dimensional image sample, so as to form a three-dimensional segmentation and labeling image sample. And the image storage equipment stores the three-dimensional image sample and the corresponding three-dimensional segmentation annotation image sample in a correlation manner. The electronic equipment acquires the three-dimensional image sample and the corresponding three-dimensional segmentation annotation image sample from the image storage equipment by sending a sample acquisition request to the image storage equipment.

And step 3022, performing normalization processing and/or image scaling processing on the three-dimensional image sample and the corresponding three-dimensional segmentation annotation image sample.

In the embodiment, in order to reduce the fact that the computation amount is large when the model is trained due to the fact that the three-dimensional image sample image and the corresponding three-dimensional segmentation annotation image sample are too large, or the three-dimensional image sample image and the corresponding three-dimensional segmentation annotation image sample are too small to reflect the detail information in more samples, the three-dimensional image sample image and the three-dimensional segmentation annotation image sample are preprocessed.

The preprocessing may include normalization processing and/or image scaling processing, among others.

Specifically, when normalization processing is performed on a three-dimensional image sample and a corresponding three-dimensional segmentation labeling image sample, a three-dimensional normalization operator can be used for processing. When the image reduction processing is performed on the three-dimensional image sample and the corresponding three-dimensional segmentation labeling image sample, the image reduction processing can be performed by adopting a cropping mode or a down-sampling mode. When the image is amplified, the image amplification can be performed by an up-sampling method.

And step 3023, dividing the three-dimensional image sample and the three-dimensional segmentation annotation image sample along a preset spatial direction to obtain a plurality of two-dimensional slice image samples and corresponding two-dimensional segmentation annotation image samples.

In this embodiment, the three-dimensional image sample and the three-dimensional segmentation labeling image sample are divided along the same preset spatial direction, and the three-dimensional image sample is divided into a plurality of two-dimensional slice image samples. The plurality of two-dimensional slice image samples are arranged along a predetermined spatial direction, so that a series of two-dimensional slice image samples are present in the predetermined spatial direction. And dividing the two-dimensional segmentation annotation image sample into two-dimensional segmentation annotation image samples corresponding to each two-dimensional slice image sample. Similarly, the two-dimensional segmentation annotation image samples are continuous two-dimensional segmentation annotation image samples in the preset spatial direction.

It can be understood that, if a relatively large number of two-dimensional slice image samples and corresponding two-dimensional segmentation label image samples are divided in the preset spatial direction. The preset number of two-dimensional slice image samples and the corresponding two-dimensional segmentation annotation image samples can be extracted from the divided two-dimensional slice image samples and the corresponding two-dimensional segmentation annotation image samples in the same extraction mode. And ensuring that the two-dimensional slice image samples and the two-dimensional segmentation labeling image samples are still continuous in the preset space direction.

The preset spatial direction may be a depth D direction, or a length H direction or a width W direction of the three-dimensional image sample. This is not limited in this embodiment.

Fig. 8 is a schematic flowchart of a training method of a convolutional neural network model for image segmentation according to a fourth embodiment of the present disclosure, and as shown in fig. 8, the training method of a convolutional neural network model for image segmentation provided in this embodiment further refines step 303 on the basis of the training method of a convolutional neural network model for image segmentation provided in any one of the above embodiments, then the training method of a convolutional neural network model for image segmentation provided in this embodiment includes the following steps:

step 3031, grouping training samples.

In this embodiment, a plurality of two-dimensional slice image samples and corresponding two-dimensional segmentation labeling image samples that are consecutive in a preset spatial direction are grouped. The grouped two-dimensional slice image samples in each group of training samples and the corresponding two-dimensional segmentation labeling image samples are still continuous in the preset space direction, so that the information in the preset space direction is ensured to be integrated in the preset three-dimensional coding network structure.

And step 3032, circularly inputting each group of training samples into the initial convolutional neural network model, and after the training samples are input each time, adjusting the training parameters in the initial convolutional neural network model to train the initial convolutional neural network model.

Step 3033, if the preset convergence condition is satisfied, stopping inputting the training sample, and determining the convolutional neural network model satisfying the preset convergence condition as the convolutional neural network model trained to be converged.

In this embodiment, in order to further reduce the computation amount when the initial convolutional neural network model is trained, not all training samples are input into the initial convolutional neural network model to train the initial convolutional neural network model. And each group of training samples is circularly input into the initial convolutional neural network model to train the initial convolutional neural network model. After inputting the current set of training samples each time, the training parameters in the initial convolutional neural network model are adjusted as a whole along the direction of model convergence. And after the training parameters are adjusted every time, judging whether the preset convergence condition is met, and if the preset convergence condition is not met, continuously inputting the next group of training samples into the initial convolutional neural network model for continuous training. If the initial convolutional neural network model is still not converged after each group of training samples participate in the training of the initial convolutional neural network model, continuing inputting the first group of training samples into the initial convolutional neural network model to train the initial convolutional neural network model until a preset convergence condition is met. And finally determining the convolutional neural network model meeting the preset convergence condition as the convolutional neural network model trained to be converged.

Wherein the preset convergence condition is to minimize a loss function in the convolutional neural network model. Or the iteration number reaches the preset iteration number. This is not limited in this embodiment.

In the training method of the convolutional neural network model for image segmentation provided by this embodiment, when training an initial convolutional neural network model by using training samples to obtain a convolutional neural network model trained to be convergent, the training samples are grouped; circularly inputting each group of training samples into an initial convolutional neural network model, and after the training samples are input each time, adjusting training parameters in the initial convolutional neural network model to train the initial convolutional neural network model; and if the preset convergence condition is met, stopping inputting the training sample, and determining the convolutional neural network model meeting the preset convergence condition as the convolutional neural network model trained to be converged. After the training samples are grouped, the initial convolutional neural network model is trained, so that the operation amount of the initial convolutional neural network model can be further reduced, and the training efficiency of the initial convolutional neural network model is improved.

Fig. 9 is a schematic flowchart of an image segmentation method based on a convolutional neural network model according to a fifth embodiment of the present disclosure, and as shown in fig. 9, the main implementation of the image segmentation method based on a convolutional neural network model according to this embodiment is an image segmentation apparatus based on a convolutional neural network model, and the image segmentation apparatus based on a convolutional neural network model may be located in an electronic device. The electronic device may be different from the electronic device that trains the convolutional neural network model. The image segmentation method based on the convolutional neural network model provided by the embodiment includes the following steps:

step 901, acquiring a target three-dimensional image.

In this embodiment, the target three-dimensional image is a three-dimensional image to be segmented. The target three-dimensional image can be stored in an image storage device, the training sample and the target three-dimensional image to be segmented are stored in a classification mode in the image storage device, and the target three-dimensional image and the corresponding training sample can be associated. The electronic equipment sends an image acquisition request to the image storage equipment, and further acquires a target three-dimensional image from the image storage equipment.

Alternatively, the target three-dimensional image may be a medical three-dimensional image, a video in the form of a three-dimensional image, or a plurality of continuously shot images.

Step 902, dividing the target three-dimensional image along a preset space direction to form a plurality of target two-dimensional slice images.

It should be noted that the preset spatial direction for dividing the target three-dimensional image is the same as the preset spatial direction for dividing the corresponding training sample. Such as both depth D directions.

Specifically, in the present embodiment, the target three-dimensional image is divided along a preset spatial direction. The plurality of target two-dimensional slice images are arranged along a preset spatial direction, so that a plurality of continuous target two-dimensional slice images are in the preset spatial direction. The information of the target three-dimensional image in the preset space direction is reserved.

903, performing image segmentation processing on a plurality of target two-dimensional slice images by adopting a convolutional neural network model trained to be convergent; the convolutional neural network model trained to be converged comprises a target encoder and a target decoder which are sequentially connected, the target encoder comprises a target two-dimensional coding network structure and a target three-dimensional coding network structure which are sequentially connected, and the target decoder comprises a target three-dimensional decoding network structure and a target two-dimensional decoding network structure which are sequentially connected.

Specifically, in this embodiment, each layer of the target two-dimensional coding network structure is used to perform a two-dimensional coding operation on a corresponding input image. Firstly, inputting a target two-dimensional slice image into a first layer target two-dimensional coding network structure, and then aiming at the first layer target two-dimensional coding network structure, inputting the image as a target two-dimensional slice image. And aiming at the target two-dimensional coding network structures from the second layer to the Nth layer, the input image is a target two-dimensional down-sampling feature map output by the target two-dimensional coding network structure of the previous layer. And then converting the plurality of target two-dimensional down-sampling feature maps into a target three-dimensional conversion down-sampling feature map, inputting the target three-dimensional conversion down-sampling feature map into a first layer of target three-dimensional coding network structure for three-dimensional coding, wherein each layer of target three-dimensional coding network structure is used for carrying out three-dimensional coding operation on the corresponding input feature map. And finally, outputting a target three-dimensional down-sampling feature map by a target three-dimensional coding network structure. It should be noted that, after the encoding process is performed on each layer of the encoding network structure, the size of the output feature map is continuously reduced.

And secondly, inputting the target three-dimensional down-sampling feature map into a first layer of target three-dimensional decoding network structure, wherein each layer of target three-dimensional decoding network structure is used for carrying out three-dimensional decoding operation on the corresponding input feature map. And after the target three-dimensional up-sampling feature map is output by the last layer of three-dimensional decoding network structure, converting the target three-dimensional up-sampling feature map into a plurality of target two-dimensional conversion up-sampling feature maps. And inputting a plurality of target two-dimensional transformation up-sampling feature maps into a two-dimensional decoding network structure. Each layer of two-dimensional decoding network structure is used for carrying out two-dimensional decoding operation on the corresponding input feature graph, and finally, a segmentation result graph output by the target two-dimensional decoding network structure is obtained. It should be noted that, after decoding processing is performed on each layer of decoding network structure, the size of the output feature map is increasing. The size of the finally output segmentation result image is the same as that of the target two-dimensional slice image.

In the image segmentation method based on the convolutional neural network model provided by the embodiment, a target three-dimensional image is obtained; dividing the target three-dimensional image along a preset space direction to form a plurality of target two-dimensional slice images; carrying out image segmentation processing on a plurality of target two-dimensional slice images by adopting a convolutional neural network model trained to be convergent; the convolutional neural network model trained to be converged comprises a target encoder and a target decoder which are connected in sequence, the target encoder comprises a target two-dimensional coding network structure and a target three-dimensional coding network structure which are connected in sequence, the target decoder comprises a target three-dimensional decoding network structure and a target two-dimensional decoding network structure which are connected in sequence, and the convolutional neural network model trained to be converged comprises the two-dimensional coding and decoding network structures and the three-dimensional coding and decoding network structures, so that the convolutional neural network model which can make the training be converged is the convolutional neural network model with the optimal performance, and when the optimal convolutional neural network model is used for segmenting an image, the segmentation result is more accurate.

Fig. 10 is a schematic flowchart of an image segmentation method based on a convolutional neural network model according to a sixth embodiment of the present disclosure, and as shown in fig. 10, the image segmentation method based on a convolutional neural network model provided in this embodiment further refines step 903 on the basis of the image segmentation method based on a convolutional neural network model provided in the embodiment shown in fig. 9, and then the image segmentation method based on a convolutional neural network model provided in this embodiment includes the following steps:

step 9031, grouping the plurality of target two-dimensional slice images.

In this embodiment, a plurality of target two-dimensional slice images that are continuous in a preset spatial direction are grouped. Each group of grouped target two-dimensional slice images are still continuous in the preset space direction so as to ensure that information in the preset space direction is integrated in the preset three-dimensional coding network structure.

And 9032, sequentially inputting each group of target two-dimensional slice images into a convolutional neural network model trained to be convergent.

And 9033, performing image segmentation on each group of target two-dimensional slice images through the convolutional neural network model trained to be convergent.

In this embodiment, after each group of target two-dimensional slice images are sequentially input into the convolutional neural network model trained to converge, at least one target two-dimensional coding network structure in the convolutional neural network model trained to converge performs two-dimensional coding operation on each target two-dimensional slice image in each group of target two-dimensional slice images to form a target two-dimensional down-sampling feature map. And converting each group of target two-dimensional down-sampling feature maps into a target three-dimensional conversion down-sampling feature map. And carrying out three-dimensional coding on the target three-dimensional transformation down-sampling feature map by at least one target three-dimensional coding network structure to form a target three-dimensional down-sampling feature map. And inputting the target three-dimensional down-sampling feature map into at least one target three-dimensional decoding network structure, and performing three-dimensional decoding operation on the target three-dimensional decoding network structure to form a target three-dimensional up-sampling feature map. And converting the target three-dimensional up-sampling feature map into a plurality of target two-dimensional transformation up-sampling feature maps. And performing two-dimensional decoding operation by at least one target two-dimensional decoding network structure, and outputting a segmentation result graph corresponding to each group of target two-dimensional slice images.

In the image segmentation method based on the convolutional neural network model provided by the embodiment, when the convolutional neural network model trained to be convergent is adopted to perform image segmentation processing on a plurality of target two-dimensional slice images, the plurality of target two-dimensional slice images are grouped; sequentially inputting each group of target two-dimensional slice images into a convolutional neural network model trained to be convergent; and carrying out image segmentation processing on each group of target two-dimensional slice images through a convolutional neural network model trained to be convergent. Because the plurality of target two-dimensional slice images are grouped and then are subjected to image segmentation processing by adopting the convolutional neural network model trained to be convergent, the operation amount of the image segmentation processing by adopting the convolutional neural network model trained to be convergent can be effectively reduced.

Fig. 11 is a schematic flowchart of an image segmentation method based on a convolutional neural network model according to a seventh embodiment of the present disclosure, and as shown in fig. 11, the image segmentation method based on a convolutional neural network model according to this embodiment further refines step 9033 on the basis of the embodiment shown in fig. 10, where the number of layers of a target two-dimensional coding network structure and a target two-dimensional decoding network structure in the convolutional neural network model trained to converge is the same, and the number of layers of the target three-dimensional coding network structure and the number of layers of the target three-dimensional decoding network structure are the same. The image segmentation method based on the convolutional neural network model provided by the embodiment includes the following steps:

step 9033a, two-dimensionally coding each target two-dimensional slice image in each group of target two-dimensional slice images through at least one layer of target two-dimensional coding network structure to obtain a plurality of target two-dimensional down-sampling feature maps corresponding to each group of target two-dimensional slice images.

Specifically, each layer of target two-dimensional coding network structure comprises a plurality of two-dimensional convolution operators, two-dimensional normalization operators, nonlinear transformation operators and down-sampling operators. As shown in fig. 12, each set of target two-dimensional slice images is first input to a first layer of target two-dimensional coding network structure, and the first layer of target two-dimensional coding network structure performs two-dimensional coding operation on each target two-dimensional slice image. Specifically, a plurality of two-dimensional convolution operators perform convolution operation on each target two-dimensional slice image, then a two-dimensional normalization operator performs normalization operation on the feature map after the convolution operation, and then a nonlinear transformation operator performs nonlinear transformation on the feature map after the normalization operation. And finally, carrying out down-sampling operation on the feature map subjected to the nonlinear transformation by a down-sampling operator. And then, inputting each intermediate two-dimensional down-sampling feature map output by the first layer of target two-dimensional coding network structure into the second layer of target two-dimensional coding network structure, and continuously performing two-dimensional coding operation on each intermediate two-dimensional down-sampling feature map by the second layer of target two-dimensional coding network structure. And repeating the steps until the last layer of target two-dimensional coding network structure performs two-dimensional coding operation on each intermediate two-dimensional down-sampling feature map output by the last layer, and obtaining a plurality of target two-dimensional down-sampling feature maps corresponding to each group of target two-dimensional slice images.

Illustratively, as shown in fig. 12, since the target three-dimensional image in fig. 12 is a grayscale image, the number of channels is 1. And if the shape corresponding to each target two-dimensional slice image in each group of target two-dimensional slice images is 1 × H × W, and the number of the target two-dimensional slice images in each group of target two-dimensional slice images is N, after the two-dimensional coding processing is performed on the first layer of preset two-dimensional coding network structure, the shape of the first middle two-dimensional down-sampling feature map is 1 × H1 × W1, and the number is N. Wherein H1< H, W1< W. After the two-dimensional coding processing is carried out on the second layer of preset two-dimensional coding network structure, the shape of the second middle two-dimensional down-sampling feature graph is 1 × H2 × W2, and the number of the second middle two-dimensional down-sampling feature graphs is N. Wherein H2< H1, W2< W1.

Optionally, as shown in fig. 12, a first feature map dimension conversion layer is provided between the last layer of the preset two-dimensional coding network structure and the first layer of the preset three-dimensional coding network structure. Therefore, before step 9033b, a step of converting, by the first feature map dimension conversion layer, a plurality of target two-dimensional down-sampling feature maps corresponding to each set of target two-dimensional slice images into a target three-dimensional transform down-sampling feature map is further included.

Specifically, in the present embodiment, the first feature diagram dimension conversion layer has a first dimension conversion operator. And the first dimension conversion operator converts a plurality of target two-dimensional down-sampling feature maps corresponding to each group of target two-dimensional slice images into a target three-dimensional conversion down-sampling feature map. The specific transformation method is specifically introduced in the method for training the convolutional neural network model, and is not described in detail herein. As in fig. 12, the shape of the target three-dimensional transform down-sampled feature map output from the first feature map dimension conversion layer is 1 × D × 1 × H2 × W2. Wherein D ═ N.

And 9033b, three-dimensionally coding the multiple target two-dimensional down-sampling feature maps through at least one layer of target three-dimensional coding network structure to obtain target three-dimensional down-sampling feature maps corresponding to each group of target two-dimensional slice images.

Specifically, each layer of target three-dimensional coding network structure comprises a plurality of three-dimensional convolution operators, three-dimensional normalization operators, nonlinear transformation operators and up-sampling operators. As shown in fig. 12, a plurality of target two-dimensional down-sampled feature maps are one target three-dimensional transform down-sampled feature map. And inputting the target three-dimensional conversion downsampling feature map into a first layer of target three-dimensional coding network structure, and carrying out three-dimensional coding operation on the target three-dimensional conversion downsampling feature map by the first layer of target three-dimensional coding network structure. Specifically, a plurality of three-dimensional convolution operators carry out convolution operation on a target three-dimensional transformation downsampling feature map, then a three-dimensional normalization operator carries out normalization operation on the feature map after the convolution operation, and then a nonlinear transformation operator carries out nonlinear transformation on the feature map after the normalization operation. And finally, carrying out down-sampling operation on the feature map subjected to the nonlinear transformation by a down-sampling operator. And then inputting the first intermediate three-dimensional down-sampling feature map output by the first layer of target three-dimensional coding network structure into the second layer of target three-dimensional coding network structure, continuously performing three-dimensional coding operation on the first intermediate three-dimensional down-sampling feature map by the second layer of target three-dimensional coding network structure, and outputting a second intermediate three-dimensional down-sampling feature map. And repeating the steps until the last layer of target three-dimensional coding network structure performs three-dimensional coding operation on the intermediate three-dimensional down-sampling feature map output by the last layer to obtain the target three-dimensional down-sampling feature map.

As shown in fig. 12, the shape of the first intermediate three-dimensional down-sampled feature map is 1 × D1 × 1 × H3 × W3. Wherein D1< D, H3< H2, and W3< W2. The shape of the second intermediate three-dimensional down-sampled feature map (i.e., the target three-dimensional down-sampled feature map) is 1 × D2 × 1 × H4 × W4. Wherein D2< D1, H4< H3, and W4< W3.

And 9033c, three-dimensionally decoding the target three-dimensional down-sampling feature map through at least one layer of target three-dimensional decoding network structure to obtain a target three-dimensional up-sampling feature map corresponding to each group of target two-dimensional slice images.

In this embodiment, each layer of target three-dimensional decoding network structure includes a plurality of three-dimensional convolution operators, an S three-dimensional normalization operator, a nonlinear transformation operator, and an upsampling operator. As shown in fig. 12, first, the target three-dimensional down-sampling feature map is input to the first layer target three-dimensional decoding network structure, and the first layer target three-dimensional decoding network structure performs a three-dimensional decoding operation on the target three-dimensional down-sampling feature map. Specifically, a plurality of three-dimensional convolution operators carry out convolution operation on a target three-dimensional down-sampling feature map, then a three-dimensional normalization operator carries out normalization operation on the feature map after the convolution operation, and then a nonlinear transformation operator carries out nonlinear transformation on the feature map after the normalization operation. And finally, performing up-sampling operation on the feature map subjected to the nonlinear transformation by an up-sampling operator. And then inputting the first intermediate three-dimensional up-sampling feature map output by the first layer target three-dimensional decoding network structure into the second layer target three-dimensional decoding network structure, and continuously performing three-dimensional decoding operation on the first intermediate three-dimensional up-sampling feature map by the second layer target three-dimensional decoding network structure to output a second intermediate three-dimensional up-sampling feature map. And repeating the steps until the last layer of target three-dimensional decoding network structure performs three-dimensional decoding operation on the middle three-dimensional up-sampling feature map output by the last layer of target three-dimensional decoding network structure to obtain the target three-dimensional up-sampling feature map.

As shown in fig. 12, the shape of the first intermediate three-dimensional upsampled feature is 1 × D1 × 1 × H3 × W3. Wherein D1> D2, H3> H4, and W3> W4. The shape of the second intermediate three-dimensional upsampled feature map (i.e., the target three-dimensional upsampled feature map) is 1 × D × 1 × H2 × W2. Wherein D > D1, H2> H3, and W2> W3.

Optionally, as shown in fig. 12, a second feature map dimension conversion layer is provided between the last layer of the preset three-dimensional decoding network structure and the first layer of the preset two-dimensional decoding network structure. Before step 9033d, a step of converting the target three-dimensional up-sampling feature map corresponding to each group of target two-dimensional slice images into a plurality of target two-dimensional transformation up-sampling feature maps through a second feature map dimension conversion layer is further included.

Specifically, in this embodiment, the second feature diagram dimension conversion layer has a second dimension conversion operator therein. And the second dimension conversion operator converts the target three-dimensional up-sampling feature maps corresponding to each group of target two-dimensional slice images into a plurality of target two-dimensional conversion up-sampling feature maps. The specific transformation method is specifically introduced in the method for training the convolutional neural network model, and is not described in detail herein.

As shown in fig. 12, the second dimension conversion operator converts the target three-dimensional upsampled feature map with the shape of 1 × D × 1 × H2 × W2 into N target two-dimensional transformed upsampled feature maps with the shape of 1 × H2 × W2.

And 9033d, performing two-dimensional decoding on the target three-dimensional up-sampling feature map through at least one layer of target two-dimensional decoding network structure to obtain a plurality of segmentation result maps corresponding to each group of target two-dimensional slice images.

In this embodiment, each layer of target two-dimensional decoding network structure includes a plurality of two-dimensional convolution operators, two-dimensional normalization operators, nonlinear transformation operators, and upsampling operators. As shown in fig. 12, a plurality of target two-dimensional transform up-sampling feature maps are first input to a first layer target two-dimensional decoding network structure, and the first layer target two-dimensional decoding network structure performs a two-dimensional decoding operation on each target two-dimensional transform up-sampling feature map. Specifically, a plurality of two-dimensional convolution operators perform convolution operation on the two-dimensional transformation upsampling feature map of each target, then a two-dimensional normalization operator performs normalization operation on the feature map after the convolution operation, and then a nonlinear transformation operator performs nonlinear transformation on the feature map after the normalization operation. And finally, performing up-sampling operation on the feature map subjected to the nonlinear transformation by an up-sampling operator. And then inputting each first middle two-dimensional up-sampling feature map output by the first layer target two-dimensional decoding network structure into the second layer target two-dimensional decoding network structure, continuously performing two-dimensional decoding operation on each first middle two-dimensional up-sampling feature map by the second layer target two-dimensional decoding network structure, and outputting each second middle two-dimensional up-sampling feature map. And repeating the steps until the last layer of target two-dimensional decoding network structure performs two-dimensional decoding operation on each intermediate two-dimensional up-sampling feature map output by the last layer, and obtaining a plurality of final segmentation result maps corresponding to each group of target two-dimensional slice images.

As shown in fig. 12, the first intermediate two-dimensional upsampled feature map has a shape of 1 × H1 × W1, and the number is N. Wherein H1> H2, W2> W1. The second intermediate two-dimensional upsampled feature map (i.e., the segmentation result map) has a shape of 1 × H × W, and the number is N. Wherein H > H1, W > W1.

In the image segmentation method based on the convolutional neural network model provided in this embodiment, the training to the converged convolutional neural network model further includes: a first feature map dimension conversion layer and a second feature map dimension conversion layer; before three-dimensionally coding the plurality of target two-dimensional up-sampling feature maps through at least one layer of target three-dimensional coding network structure, the method further comprises the following steps: converting a plurality of target two-dimensional up-sampling feature maps corresponding to each group of target two-dimensional slice images into target three-dimensional conversion up-sampling feature maps through a first feature map dimension conversion layer; before two-dimensional decoding is carried out on the target three-dimensional down-sampling feature map through at least one layer of target two-dimensional decoding network structure, the method further comprises the following steps: and converting the target three-dimensional down-sampling feature map corresponding to each group of target two-dimensional slice images into a plurality of target two-dimensional conversion up-sampling feature maps through a second feature map dimension conversion layer. The feature dimension conversion can be carried out by the first feature map dimension conversion layer and the second feature map dimension conversion layer independently, so that the corresponding functions of each layer of the preset two-dimensional coding network structure, each layer of the preset three-dimensional decoding network structure and each layer of the preset two-dimensional decoding network structure are the same.

As an alternative, in this embodiment, the convolutional neural network model trained to converge is any U-shaped model, such as UNet, UNet + +, U2 Net. Therefore, the number of layers of the target two-dimensional coding network structure and the target two-dimensional decoding network structure in the convolutional neural network model trained to be converged is the same, and the number of layers of the target three-dimensional coding network structure and the target three-dimensional decoding network structure is the same. After the target three-dimensional image is well segmented by adopting the convolutional neural network model trained to be convergent, the size of the segmented segmentation result image is the same as that of a target two-dimensional slice image segmented by the target three-dimensional image.

As an alternative implementation, in this embodiment, as shown in fig. 12, the performing image segmentation processing on each set of target two-dimensional slice images by a convolutional neural network model trained to converge further includes:

inputting a plurality of target two-dimensional down-sampling feature maps output by at least one layer of target two-dimensional coding network structure into a target two-dimensional decoding network structure of a symmetrical layer; and inputting the target three-dimensional down-sampling feature maps output by other target three-dimensional coding network structures of the hierarchy except the target three-dimensional coding network structure of the last layer into the target three-dimensional decoding network structure of the symmetrical hierarchy.

Specifically, a target two-dimensional coding network structure of a symmetrical hierarchy is connected with a target two-dimensional decoding network structure, since the last layer of the target three-dimensional decoding network structure is already connected with the first layer of the target three-dimensional coding network structure. The target three-dimensional coding network structures of other levels except the target three-dimensional decoding network structure of the last layer are connected with the target three-dimensional decoding network structure of the symmetrical level. The target two-dimensional coding network structure decodes not only the feature map output by the previous decoding network structure and retaining the coarse-grained information, but also the feature map output by the symmetrical level target two-dimensional coding network structure. Therefore, the characteristic diagram decoded by adopting the target two-dimensional decoding network structure and the target three-dimensional decoding network structure not only retains coarse granularity information, but also retains detail information. The segmentation result is more accurate.

As an optional implementation manner, in this embodiment, in the image segmentation method based on a convolutional neural network model provided in any of the above embodiments, before step 902, the method further includes: and performing normalization processing and/or image scaling processing on the target three-dimensional image.

Specifically, when the normalization processing is performed on the target three-dimensional image, a three-dimensional normalization operator may be used for the processing. When the image reduction processing is performed on the target three-dimensional image, the image reduction processing may be performed by a cropping method or a downsampling method. When the target three-dimensional image is amplified, an up-sampling mode can be adopted for image amplification.

According to the image segmentation method based on the convolutional neural network model, the normalization processing and/or the image scaling processing are/is carried out on the target three-dimensional image, and the problem that due to the fact that the target three-dimensional image is too large, the calculation amount is large when the model is segmented, or the target three-dimensional image is too small, and more detailed information in samples cannot be reflected can be solved.

Fig. 13 is a schematic structural diagram of a training apparatus for a convolutional neural network model for image segmentation according to an eighth embodiment of the present disclosure, and as shown in fig. 13, a training apparatus 1300 for a convolutional neural network model for image segmentation provided in this embodiment includes: a model obtaining unit 1301, a sample obtaining unit 1302, and a model training unit 1303.

The model obtaining unit 1301 is configured to obtain a pre-constructed initial convolutional neural network model, where the initial convolutional neural network model includes an initial encoder and an initial decoder that are connected in sequence, the initial encoder includes a preset two-dimensional coding network structure and a preset three-dimensional coding network structure that are connected in sequence, and the initial decoder includes a preset three-dimensional decoding network structure and a preset two-dimensional decoding network structure that are connected in sequence. The sample obtaining unit 1302 is configured to obtain a training sample for training the initial convolutional neural network model, where the training sample is a plurality of two-dimensional slice image samples that are continuous along a preset spatial direction and corresponding two-dimensional segmentation label image samples. And the model training unit 1303 is configured to train the initial convolutional neural network model by using the training samples to obtain a convolutional neural network model trained to be convergent, and the convolutional neural network model trained to be convergent is used for performing image segmentation on the target three-dimensional image to be segmented.

The training apparatus for the convolutional neural network model for image segmentation provided in this embodiment may implement the technical solution of the method embodiment shown in fig. 3, and the implementation principle and technical effect thereof are similar to those of the method embodiment shown in fig. 3, and are not described in detail here.

Optionally, the training apparatus for the convolutional neural network model for image segmentation further includes: and the model construction unit is used for constructing an initial convolutional neural network model by adopting a preset two-dimensional convolutional neural network model and a preset three-dimensional convolutional neural network model.

Optionally, the model building unit includes: the system comprises a model obtaining module, a first structure replacing module and a second structure replacing module.

The model obtaining module is used for obtaining a preset two-dimensional encoder and a preset two-dimensional decoder in a preset two-dimensional convolutional neural network model and a preset three-dimensional encoder and a preset three-dimensional decoder in the preset three-dimensional convolutional neural network model. And the first structure replacing module is used for replacing at least one layer of the preset two-dimensional coding network structure at the tail end in the preset two-dimensional encoder with the preset three-dimensional coding network structure in the preset three-dimensional encoder. And the second structure replacing module is used for replacing at least one layer of the preset two-dimensional decoding network structure at the front end in the preset two-dimensional decoder with the preset three-dimensional decoding network structure in the preset three-dimensional decoder.

Optionally, the number of layers of the preset two-dimensional coding network structure and the preset two-dimensional decoding network structure in the initial convolutional neural network model is the same, and the number of layers of the preset three-dimensional coding network structure and the preset three-dimensional decoding network structure is the same. The preset two-dimensional coding network structure of the symmetrical level is connected with the preset two-dimensional decoding network structure, and the preset three-dimensional coding network structure of the symmetrical level is connected with the preset three-dimensional decoding network structure.

Optionally, the model building unit further includes: the device comprises a first adding module and a second adding module.

The first adding module is used for adding a first feature map dimension conversion layer between the last layer of preset two-dimensional coding network structure and the first layer of preset three-dimensional coding network structure in the initial convolutional neural network model, and the first feature map dimension conversion layer is used for converting a plurality of sample two-dimensional down-sampling feature maps output by the last layer of preset two-dimensional coding network structure into sample three-dimensional down-sampling feature maps. And the second adding module is used for adding a second feature map dimension conversion layer between the last layer of preset three-dimensional decoding network structure and the first layer of preset two-dimensional decoding network structure in the initial convolutional neural network model, and the second feature map dimension conversion layer is used for converting the sample three-dimensional up-sampling feature map output by the last layer of preset three-dimensional decoding network structure into a plurality of sample two-dimensional up-sampling feature maps.

Optionally, the sample obtaining unit 1301 includes: the device comprises a sample acquisition module and a sample division module.

The sample acquisition module is used for acquiring a three-dimensional image sample and a corresponding three-dimensional segmentation annotation image sample. And the sample dividing module is used for dividing the three-dimensional image sample and the three-dimensional segmentation annotation image sample along a preset space direction so as to obtain a plurality of two-dimensional slice image samples and corresponding two-dimensional segmentation annotation image samples.

Optionally, the sample obtaining unit 1301 further includes: and a sample preprocessing module.

The sample preprocessing module is used for carrying out normalization processing and/or image scaling processing on the three-dimensional image sample and the corresponding three-dimensional segmentation labeling image sample.

Optionally, the model training unit 1303 includes: the device comprises a sample grouping module, a model training module and a model determining module.

The sample grouping module is used for grouping the training samples. And the model training module is used for circularly inputting each group of training samples into the initial convolutional neural network model, and adjusting the training parameters in the initial convolutional neural network model after the training samples are input each time so as to train the initial convolutional neural network model. And the model determining module is used for stopping inputting the training sample if the preset convergence condition is met, and determining the convolutional neural network model meeting the preset convergence condition as the convolutional neural network model trained to be converged.

The training apparatus for the convolutional neural network model for image segmentation provided in this embodiment may execute the technical solutions of the method embodiments shown in fig. 4, 7, and 9, and the implementation principle and technical effects thereof are similar to those of the method embodiments shown in fig. 4, 7, and 9, and are not described in detail here.

Fig. 14 is a schematic structural diagram of an image segmentation apparatus based on a convolutional neural network model according to a ninth embodiment of the present disclosure, and as shown in fig. 14, an image segmentation apparatus 1400 based on a convolutional neural network model according to the present embodiment includes: an image acquisition unit 1401, an image dividing unit 1402, and an image dividing unit 1403.

An image acquiring unit 1401 for acquiring a target three-dimensional image. An image dividing unit 1402 for dividing the target three-dimensional image along a preset spatial direction to form a plurality of target two-dimensional slice images. An image segmentation unit 1403, configured to perform image segmentation processing on the multiple target two-dimensional slice images by using a convolutional neural network model trained to converge; the convolutional neural network model trained to be converged comprises a target encoder and a target decoder which are sequentially connected, the target encoder comprises a target two-dimensional coding network structure and a target three-dimensional coding network structure which are sequentially connected, and the target decoder comprises a target three-dimensional decoding network structure and a target two-dimensional decoding network structure which are sequentially connected.

The image segmentation apparatus based on the convolutional neural network model provided in this embodiment may implement the technical solution of the method embodiment shown in fig. 9, and the implementation principle and technical effect thereof are similar to those of the method embodiment shown in fig. 9, and are not described in detail herein.

Optionally, the image segmentation unit 1403 includes: the image segmentation device comprises an image grouping module, an image input module and an image segmentation module.

The image grouping module is used for grouping a plurality of target two-dimensional slice images. And the image input module is used for sequentially inputting each group of target two-dimensional slice images into the convolutional neural network model trained to be convergent. And the image segmentation module is used for carrying out image segmentation processing on each group of target two-dimensional slice images through a convolutional neural network model trained to be convergent.

Optionally, the image segmentation module comprises: the device comprises a two-dimensional coding submodule, a three-dimensional decoding submodule and a two-dimensional decoding submodule.

The two-dimensional coding sub-module is used for carrying out two-dimensional coding on each target two-dimensional slice image in each group of target two-dimensional slice images through at least one layer of the target two-dimensional coding network structure so as to obtain a plurality of target two-dimensional down-sampling feature maps corresponding to each group of target two-dimensional slice images. And the three-dimensional coding sub-module is used for carrying out three-dimensional coding on the plurality of target two-dimensional down-sampling feature maps through at least one layer of the target three-dimensional coding network structure so as to obtain the target three-dimensional down-sampling feature maps corresponding to each group of target two-dimensional slice images. And the three-dimensional decoding submodule is used for carrying out three-dimensional decoding on the target three-dimensional down-sampling feature map through at least one layer of the target three-dimensional decoding network structure so as to obtain a target three-dimensional up-sampling feature map corresponding to each group of target two-dimensional slice images. And the two-dimensional decoding submodule is used for carrying out two-dimensional decoding on the target three-dimensional up-sampling feature map through at least one layer of the target two-dimensional decoding network structure so as to obtain a plurality of segmentation result maps corresponding to each group of target two-dimensional slice images.

Optionally, the number of layers of the target two-dimensional coding network structure and the target two-dimensional decoding network structure in the convolutional neural network model trained to be converged is the same, and the number of layers of the target three-dimensional coding network structure and the target three-dimensional decoding network structure are the same.

Optionally, the image segmentation module further includes: a first characteristic diagram input submodule and a second characteristic diagram input submodule.

The first feature map input submodule is used for inputting a plurality of target two-dimensional down-sampling feature maps output by the at least one layer of target two-dimensional coding network structure into a target two-dimensional decoding network structure of a symmetrical hierarchy. And the second characteristic diagram input submodule is used for inputting the target three-dimensional down-sampling characteristic diagrams output by other target three-dimensional coding network structures except the target three-dimensional coding network structure at the last layer into the target three-dimensional decoding network structure at the symmetrical layer.

Optionally, the training to the converged convolutional neural network model further comprises: the device comprises a first feature diagram dimension conversion layer and a second feature diagram dimension conversion layer.

The image segmentation module further comprises: a first conversion submodule and a second conversion submodule.

The first conversion submodule is used for converting a plurality of target two-dimensional down-sampling feature maps corresponding to each group of target two-dimensional slice images into a target three-dimensional down-sampling feature map through the first feature map dimension conversion layer. And the second conversion submodule is used for converting the target three-dimensional up-sampling feature maps corresponding to each group of target two-dimensional slice images into a plurality of target two-dimensional conversion up-sampling feature maps through the second feature map dimension conversion layer.

Optionally, the image segmentation apparatus based on the convolutional neural network model provided in this embodiment further includes: and the image preprocessing unit is used for carrying out normalization processing and/or image scaling processing on the target three-dimensional image.

The image segmentation apparatus based on the convolutional neural network model provided in this embodiment may implement the technical solutions of the method embodiments shown in fig. 10 to 11, and the implementation principles and technical effects thereof are similar to those of the method embodiments shown in fig. 10 to 11, and are not described in detail herein.

The present disclosure also provides an electronic device and a readable storage medium according to an embodiment of the present disclosure.

According to an embodiment of the present disclosure, the present disclosure also provides a computer program product comprising: a computer program, stored in a readable storage medium, from which at least one processor of the electronic device can read the computer program, the at least one processor executing the computer program causing the electronic device to perform the solution provided by any of the embodiments described above.

FIG. 15 is a block diagram of an electronic device for implementing a convolutional neural network model training method for image segmentation and an image segmentation method based on a convolutional neural network model according to an embodiment of the present disclosure. As shown in FIG. 15, electronic device 1500 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 15, the electronic device 1500 includes a calculation unit 1501 which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)1502 or a computer program loaded from a storage unit 1508 into a Random Access Memory (RAM) 1503. In the RAM 1503, various programs and data necessary for the operation of the device 1500 can also be stored. The calculation unit 1501, the ROM 1502, and the RAM 1503 are connected to each other by a bus 1504. An input/output (I/O) interface 1505 is also connected to bus 1504.

Various components in device 1500 connect to I/O interface 1505, including: an input unit 1506 such as a keyboard, a mouse, and the like; an output unit 1507 such as various types of displays, speakers, and the like; a storage unit 15015 such as a magnetic disk, an optical disk, or the like; and a communication unit 1509 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 1509 allows the electronic device 1500 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 1501 may be various general and/or special purpose processing components having processing and computing capabilities. Some examples of the computation unit 1501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computation chips, various computation units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. The calculation unit 1501 executes the respective methods and processes described above, such as the traffic light recognition result processing method. For example, in some embodiments, the traffic light identification result processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 1500 via the ROM 1502 and/or the communication unit 1509. When the computer program is loaded into RAM 1503 and executed by the computing unit 1501, one or more steps of the neural network model training method for network congestion control described above may be performed. Alternatively, in other embodiments, the computation unit 1501 may be configured to perform the neural network model training method for network congestion control in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

The initial convolutional neural network model has a two-dimensional coding network structure and a three-dimensional coding network structure, and also has a preset three-dimensional decoding network structure and a preset two-dimensional decoding network structure, so that when the initial convolutional neural network model is trained, the spatial information in a three-dimensional image can be reserved, and the computation amount in the model training process is reduced. And when the convolution neural network model trained to be convergent is adopted to segment the image, the operation amount in the image segmentation process can be effectively reduced. And the initial convolutional neural network model is an integral convolutional model comprising an initial encoder and an initial decoder, so that when the initial convolutional neural network model is trained, the initial convolutional neural network model is used as an integral to be trained to adjust training parameters in the model, so that the performance of the convolutional neural network model which is converged during training can be optimized, the optimal convolutional neural network model is obtained, and when the optimal convolutional neural network model is adopted to segment images, the segmentation result is more accurate.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of training a convolutional neural network model for image segmentation, comprising:

2. The method of claim 1, wherein prior to obtaining the pre-constructed initial convolutional neural network model, further comprising:

and constructing an initial convolutional neural network model by adopting a preset two-dimensional convolutional neural network model and a preset three-dimensional convolutional neural network model.

3. The method of claim 2, wherein the constructing the initial convolutional neural network model using the preset two-dimensional convolutional neural network model and the preset three-dimensional convolutional neural network model comprises:

acquiring a preset two-dimensional encoder and a preset two-dimensional decoder in a preset two-dimensional convolutional neural network model and a preset three-dimensional encoder and a preset three-dimensional decoder in a preset three-dimensional convolutional neural network model;

at least one layer of preset two-dimensional coding network structure at the tail end in a preset two-dimensional encoder is replaced by a preset three-dimensional coding network structure in the preset three-dimensional encoder;

and replacing at least one layer of a preset two-dimensional decoding network structure at the front end in a preset two-dimensional decoder with a preset three-dimensional decoding network structure in the preset three-dimensional decoder.

4. The method according to claim 1, wherein the number of layers of the preset two-dimensional coding network structure and the preset two-dimensional decoding network structure in the initial convolutional neural network model are the same, and the number of layers of the preset three-dimensional coding network structure and the preset three-dimensional decoding network structure are the same;

the preset two-dimensional coding network structure of the symmetrical level is connected with the preset two-dimensional decoding network structure, and the preset three-dimensional coding network structure of the symmetrical level is connected with the preset three-dimensional decoding network structure.

5. The method of claim 3, wherein constructing the initial convolutional neural network model using the preset two-dimensional convolutional neural network model and the preset three-dimensional convolutional neural network model further comprises:

adding a first feature map dimension conversion layer between a last layer of preset two-dimensional coding network structure and a first layer of preset three-dimensional coding network structure in the initial convolutional neural network model, wherein the first feature map dimension conversion layer is used for converting a plurality of sample two-dimensional down-sampling feature maps output by the last layer of preset two-dimensional coding network structure into sample three-dimensional conversion down-sampling feature maps;

and adding a second feature map dimension conversion layer between the last layer of preset three-dimensional decoding network structure and the first layer of preset two-dimensional decoding network structure in the initial convolutional neural network model, wherein the second feature map dimension conversion layer is used for converting the sample three-dimensional up-sampling feature map output by the last layer of preset three-dimensional decoding network structure into a plurality of sample two-dimensional up-sampling feature maps.

6. The method of any one of claims 1-5, wherein said obtaining training samples for training the initial convolutional neural network model comprises:

acquiring a three-dimensional image sample and a corresponding three-dimensional segmentation annotation image sample;

and dividing the three-dimensional image sample and the three-dimensional segmentation annotation image sample along a preset space direction to obtain a plurality of two-dimensional slice image samples and corresponding two-dimensional segmentation annotation image samples.

7. The method of claim 6, wherein before said dividing the three-dimensional image samples and the three-dimensional segmentation annotation image samples along a preset spatial direction, further comprising:

and carrying out normalization processing and/or image scaling processing on the three-dimensional image sample and the corresponding three-dimensional segmentation labeling image sample.

8. The method of any one of claims 1-5, wherein training the initial convolutional neural network model with the training samples to obtain a convolutional neural network model trained to converge comprises:

grouping the training samples;

circularly inputting each group of training samples into an initial convolutional neural network model, and after the training samples are input each time, adjusting training parameters in the initial convolutional neural network model to train the initial convolutional neural network model;

and if the preset convergence condition is met, stopping inputting the training sample, and determining the convolutional neural network model meeting the preset convergence condition as the convolutional neural network model trained to be converged.

9. An image segmentation method based on a convolutional neural network model comprises the following steps:

acquiring a target three-dimensional image;

10. The method of claim 9, wherein the image segmentation processing of the plurality of target two-dimensional slice images using the convolutional neural network model trained to converge comprises:

grouping the plurality of target two-dimensional slice images;

sequentially inputting each group of target two-dimensional slice images into a convolutional neural network model trained to be convergent;

and carrying out image segmentation processing on each group of target two-dimensional slice images through the convolutional neural network model trained to be convergent.

11. The method of claim 10, wherein the performing image segmentation processing on each set of target two-dimensional slice images through the convolutional neural network model trained to converge comprises:

performing two-dimensional coding on each target two-dimensional slice image in each group of target two-dimensional slice images through at least one layer of the target two-dimensional coding network structure to obtain a plurality of target two-dimensional down-sampling feature maps corresponding to each group of target two-dimensional slice images;

three-dimensionally coding a plurality of target two-dimensional down-sampling feature maps through at least one layer of the target three-dimensional coding network structure to obtain target three-dimensional down-sampling feature maps corresponding to each group of target two-dimensional slice images;

three-dimensionally decoding the target three-dimensional down-sampling feature map through at least one layer of the target three-dimensional decoding network structure to obtain a target three-dimensional up-sampling feature map corresponding to each group of target two-dimensional slice images;

and performing two-dimensional decoding on the target three-dimensional up-sampling feature map through at least one layer of the target two-dimensional decoding network structure to obtain a plurality of segmentation result maps corresponding to each group of target two-dimensional slice images.

12. The method of claim 10, wherein the number of layers of the target two-dimensional coding network structure and the target three-dimensional decoding network structure in the convolutional neural network model trained to converge is the same, and the number of layers of the target three-dimensional coding network structure and the target three-dimensional decoding network structure is the same.

13. The method of claim 12, wherein the image segmentation processing each set of target two-dimensional slice images by the convolutional neural network model trained to converge further comprises:

inputting a plurality of target two-dimensional down-sampling feature maps output by the at least one layer of target two-dimensional coding network structure into a target two-dimensional decoding network structure of a symmetrical layer;

and inputting the target three-dimensional down-sampling feature maps output by other target three-dimensional coding network structures of the hierarchy except the target three-dimensional coding network structure of the last layer into the target three-dimensional decoding network structure of the symmetrical hierarchy.

14. The method of claim 11, wherein the training to a converged convolutional neural network model further comprises: a first feature map dimension conversion layer and a second feature map dimension conversion layer;

before the three-dimensional coding is performed on the plurality of target two-dimensional down-sampling feature maps through at least one layer of the target three-dimensional coding network structure, the method further includes:

converting a plurality of target two-dimensional down-sampling feature maps corresponding to each group of target two-dimensional slice images into a target three-dimensional conversion down-sampling feature map through the first feature map dimension conversion layer;

before the two-dimensional decoding is performed on the target three-dimensional up-sampling feature map through at least one layer of the target two-dimensional decoding network structure, the method further includes:

and converting the target three-dimensional up-sampling feature map corresponding to each group of target two-dimensional slice images into a plurality of target two-dimensional conversion up-sampling feature maps through the second feature map dimension conversion layer.

15. The method according to any one of claims 9-14, wherein prior to said dividing said target three-dimensional image along a predetermined spatial direction to form a plurality of target two-dimensional slice images, further comprising:

and carrying out normalization processing and/or image scaling processing on the target three-dimensional image.

16. A training apparatus for a convolutional neural network model for image segmentation, comprising:

17. The apparatus of claim 16, further comprising: and the model construction unit is used for constructing an initial convolutional neural network model by adopting a preset two-dimensional convolutional neural network model and a preset three-dimensional convolutional neural network model.

18. The apparatus of claim 17, wherein the model building unit comprises:

the model acquisition module is used for acquiring a preset two-dimensional encoder and a preset two-dimensional decoder in a preset two-dimensional convolutional neural network model and a preset three-dimensional encoder and a preset three-dimensional decoder in a preset three-dimensional convolutional neural network model;

the first structure replacing module is used for replacing at least one layer of preset two-dimensional coding network structure at the tail end in a preset two-dimensional encoder with a preset three-dimensional coding network structure in the preset three-dimensional encoder;

and the second structure replacing module is used for replacing at least one layer of the preset two-dimensional decoding network structure at the front end in the preset two-dimensional decoder with the preset three-dimensional decoding network structure in the preset three-dimensional decoder.

19. The apparatus according to claim 16, wherein the number of layers of the preset two-dimensional coding network structure and the preset two-dimensional decoding network structure in the initial convolutional neural network model is the same, and the number of layers of the preset three-dimensional coding network structure and the preset three-dimensional decoding network structure is the same;

20. The apparatus of claim 18, wherein the model building unit further comprises:

the first adding module is used for adding a first feature map dimension conversion layer between the last layer of preset two-dimensional coding network structure and the first layer of preset three-dimensional coding network structure in the initial convolutional neural network model, and the first feature map dimension conversion layer is used for converting a plurality of sample two-dimensional down-sampling feature maps output by the last layer of preset two-dimensional coding network structure into sample three-dimensional down-sampling feature maps;

and the second adding module is used for adding a second feature map dimension conversion layer between the last layer of preset three-dimensional decoding network structure and the first layer of preset two-dimensional decoding network structure in the initial convolutional neural network model, and the second feature map dimension conversion layer is used for converting the sample three-dimensional up-sampling feature map output by the last layer of preset three-dimensional decoding network structure into a plurality of sample two-dimensional up-sampling feature maps.

21. The apparatus of any of claims 16-20, wherein the sample acquisition unit comprises:

the sample acquisition module is used for acquiring a three-dimensional image sample and a corresponding three-dimensional segmentation annotation image sample;

and the sample dividing module is used for dividing the three-dimensional image sample and the three-dimensional segmentation annotation image sample along a preset space direction so as to obtain a plurality of two-dimensional slice image samples and corresponding two-dimensional segmentation annotation image samples.

22. The apparatus of claim 21, wherein the sample acquisition unit further comprises:

and the sample preprocessing module is used for carrying out normalization processing and/or image scaling processing on the three-dimensional image sample and the corresponding three-dimensional segmentation annotation image sample.

23. The apparatus according to any one of claims 16-20, wherein the model training unit comprises:

the sample grouping module is used for grouping the training samples;

the model training module is used for circularly inputting each group of training samples into the initial convolutional neural network model, and adjusting training parameters in the initial convolutional neural network model after the training samples are input each time so as to train the initial convolutional neural network model;

and the model determining module is used for stopping inputting the training sample if the preset convergence condition is met, and determining the convolutional neural network model meeting the preset convergence condition as the convolutional neural network model trained to be converged.

24. An image segmentation apparatus based on a convolutional neural network model, comprising:

an image acquisition unit for acquiring a target three-dimensional image;

25. The apparatus of claim 24, wherein the image segmentation unit comprises:

an image grouping module for grouping the plurality of target two-dimensional slice images;

the image input module is used for sequentially inputting each group of target two-dimensional slice images into a convolutional neural network model trained to be convergent;

and the image segmentation module is used for carrying out image segmentation processing on each group of target two-dimensional slice images through the convolutional neural network model trained to be convergent.

26. The apparatus of claim 25, wherein the image segmentation module comprises:

the two-dimensional coding sub-module is used for carrying out two-dimensional coding on each target two-dimensional slice image in each group of target two-dimensional slice images through at least one layer of the target two-dimensional coding network structure so as to obtain a plurality of target two-dimensional down-sampling feature maps corresponding to each group of target two-dimensional slice images;

the three-dimensional coding sub-module is used for carrying out three-dimensional coding on a plurality of target two-dimensional down-sampling feature maps through at least one layer of the target three-dimensional coding network structure so as to obtain target three-dimensional down-sampling feature maps corresponding to each group of target two-dimensional slice images;

the three-dimensional decoding submodule is used for carrying out three-dimensional decoding on the target three-dimensional down-sampling feature map through at least one layer of the target three-dimensional decoding network structure so as to obtain a target three-dimensional up-sampling feature map corresponding to each group of target two-dimensional slice images;

and the two-dimensional decoding submodule is used for carrying out two-dimensional decoding on the target three-dimensional up-sampling feature map through at least one layer of the target two-dimensional decoding network structure so as to obtain a plurality of segmentation result maps corresponding to each group of target two-dimensional slice images.

27. The apparatus of claim 24, wherein the number of layers of the target two-dimensional coding network structure and the target three-dimensional decoding network structure in the convolutional neural network model trained to converge is the same, and the number of layers of the target three-dimensional coding network structure and the target three-dimensional decoding network structure is the same.

28. The apparatus of claim 27, wherein the image segmentation module further comprises:

a first feature map input sub-module, configured to input a plurality of target two-dimensional down-sampling feature maps output by the at least one layer of target two-dimensional coding network structure into a target two-dimensional decoding network structure of a symmetric hierarchy;

and the second characteristic diagram input submodule is used for inputting the target three-dimensional down-sampling characteristic diagrams output by other target three-dimensional coding network structures except the target three-dimensional coding network structure at the last layer into the target three-dimensional decoding network structure at the symmetrical layer.

29. The apparatus of claim 26, wherein the training to a converged convolutional neural network model further comprises: a first feature map dimension conversion layer and a second feature map dimension conversion layer;

the image segmentation module further comprises:

the first conversion submodule is used for converting a plurality of target two-dimensional down-sampling feature maps corresponding to each group of target two-dimensional slice images into a target three-dimensional conversion down-sampling feature map through the first feature map dimension conversion layer;

and the second conversion submodule is used for converting the target three-dimensional up-sampling feature maps corresponding to each group of target two-dimensional slice images into a plurality of target two-dimensional conversion up-sampling feature maps through the second feature map dimension conversion layer.

30. The apparatus of any one of claims 24-29, further comprising:

and the image preprocessing unit is used for carrying out normalization processing and/or image scaling processing on the target three-dimensional image.

31. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8 or 9-15.

32. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of claims 1-8 or 9-15.

33. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-8 or 9-15.