CN116758093A

CN116758093A - Image segmentation method, model training method, device, equipment and medium

Info

Publication number: CN116758093A
Application number: CN202310624615.0A
Authority: CN
Inventors: 马永杰; 刘宇; 郭远昊; 韩立强; 吉喆; 杨万欣; 张鸿褀
Original assignee: Xuanwu Hospital
Current assignee: Xuanwu Hospital
Priority date: 2023-05-30
Filing date: 2023-05-30
Publication date: 2023-09-15
Anticipated expiration: 2043-05-30
Also published as: CN116758093B

Abstract

The application relates to an image segmentation method, a model training method, a device, equipment and a medium, belongs to the technical field of computer image processing, and solves the problem that the calculation time is longer along with the increase of the scale of a problem in the existing optimization model. The technical scheme of the application mainly comprises the following steps: acquiring an image to be segmented, and carrying out feature extraction on the image to be segmented to acquire a first feature map; inputting the first feature map into a first segmentation network to generate a first segmentation result and a second feature map; generating a query vector through the second feature map; inputting the first feature map into a transducer module for high-dimensional feature coding to generate a third feature map, and simultaneously, acquiring the query vector by the transducer module to guide the high-dimensional feature coding; the third feature map is input into a second segmentation network to generate a second segmentation result.

Description

Image segmentation method, model training method, device, equipment and medium

Technical Field

The application belongs to the technical field of computer image processing, and particularly relates to an image segmentation method, a model training method, a device, equipment and a medium.

Background

Worldwide, cardiovascular and cerebrovascular diseases have become one of the main diseases threatening human health, and coronary atherosclerosis is the main cause of cardiovascular and cerebrovascular diseases. At present, the coronary stent interventional therapy has become a main therapeutic scheme for treating coronary atherosclerosis due to small trauma and good effect. During treatment, stents are placed inside the coronary arteries by treatment to reduce the probability of restenosis and thrombosis.

After a cerebral vessel stent implantation operation is performed, the adherence of each stent wire is an important reference index for the operation effect and prognosis effect. In the process of measuring the adherence of the stent wire, the medical image of the intravascular stent is required to be utilized to segment the vessel wall and the stent so as to facilitate the acquisition of the respective spatial information of the vessel wall and the stent.

In the prior art, a multi-decoding multi-label segmentation model, such as a U-net network, is generally adopted for similar segmentation tasks, but the segmentation task targets for specific tasks actually have own spatial correlation, and the factor is not considered in the prior art, so that the segmentation accuracy of the segmentation task targets is still to be improved.

Disclosure of Invention

In view of the above analysis, the embodiments of the present application are directed to an image segmentation method, a model training method, a device, a facility, and a medium, which are used for solving the problem that the segmentation accuracy is not high enough due to the fact that the existing image segmentation technology does not consider the spatial relationship existing in the segmentation target itself.

An embodiment of a first aspect of the present application provides an image segmentation method, including the steps of:

acquiring an image to be segmented, and carrying out feature extraction on the image to be segmented to acquire a first feature map; inputting the first feature map into a first segmentation network to generate a first segmentation result and a second feature map;

generating a query vector through the second feature map;

inputting the first feature map into a transducer module for high-dimensional feature encoding to generate a third feature map, and simultaneously, acquiring the query vector by the transducer module to guide the high-dimensional feature decoding;

the third feature map is input into a second segmentation network to generate a second segmentation result.

In some embodiments, the transducer module comprises a feature mapping unit, a transducer encoder, a transducer decoder, and a multi-headed attention module;

the inputting the first feature map into a transducer module for high-dimensional feature encoding to generate a third feature map, and the transducer module obtaining the query vector to guide the high-dimensional decoding, includes:

performing feature mapping on the first feature map to generate a feature vector, and performing position coding processing on the feature vector;

the feature vector is acquired by the transducer encoder to generate a first feature code;

the first feature code and the query vector are obtained by the transducer decoder to generate a second feature code;

the first feature code and the second feature code are acquired by the multi-headed attention module to generate a high-dimensional feature.

In some embodiments, the generating the high-dimensional feature further comprises:

and carrying out feature fusion on the high-dimensional features and the feature vectors to generate fusion features, and generating the third feature map through deconvolution according to the fusion features.

In some embodiments, the acquiring an image to be segmented, and extracting features of the image to be segmented to obtain a first feature map, includes:

inputting the image to be segmented into a CNN convolutional neural network, wherein the CNN convolutional neural network comprises an initialization module and M first residual error modules, the initialization module is used for extracting initial characteristics of the image to be segmented, the first residual error modules comprise a pooling layer and a first convolution layer, the pooling layer is used for downsampling the output of the initialization module or the previous first residual error modules, and the first convolution layer is used for carrying out convolution operation on the output of the pooling layer.

In some embodiments, the first and second split networks have the same split network structure, the split network structure comprising:

m second residual error modules and an output module;

the second residual error module comprises an up-sampling layer, a fusion layer and a second convolution layer, wherein the up-sampling layer is used for up-sampling the acquired characteristics, the fusion layer is used for carrying out characteristic fusion on the output of the up-sampling layer and the output of the corresponding first residual error module, and the second convolution layer is used for carrying out convolution operation on the output of the fusion layer;

the output module includes a third convolution layer and a Sigmoid function.

In some embodiments, the generating a query vector from the second feature map includes:

feature mapping is carried out through a convolution kernel with the size of 1x 1;

acquiring characteristics in the channel dimension through global average pooling;

performing dimension expansion according to the number of dimensions required by the transducer module;

random noise perturbations are added to each dimension separately to form the query vector.

An embodiment of the second aspect of the present application provides a training method for an image segmentation model, including the steps of:

constructing a training data set, wherein the training data set comprises a plurality of training images with a first segmentation target and a second segmentation target, the training images are provided with a first label and a second label, the first label is a labeling area of the first segmentation target, and the second label is a labeling area of the second segmentation target;

extracting features of the training image to obtain a first feature map;

inputting the first feature map into a first segmentation network to generate a first segmentation result and a second feature map;

generating a query vector through the second feature map;

inputting the third feature map into a second segmentation network to generate a second segmentation result;

determining a first loss value according to the difference between the first segmentation result and the first label, determining a second loss value according to the difference between the second segmentation result and the second label, and performing joint training based on the first loss value and the second loss value to obtain the image segmentation model, wherein the image segmentation model is used for segmenting the first target and the second target in an image to be segmented.

An embodiment of a third aspect of the present application provides an image segmentation apparatus, including:

the acquisition module is used for acquiring an image to be segmented, and extracting the characteristics of the image to be segmented to obtain a first characteristic image;

the first segmentation module inputs the first feature map into a first segmentation network to generate a first segmentation result and a second feature map;

the query vector generation module is used for generating a query vector through the second feature map;

the feature coding module is used for inputting the first feature map into a transducer module to perform high-dimensional feature coding so as to generate a third feature map, and meanwhile, the transducer module is used for acquiring the query vector and guiding the high-dimensional feature decoding;

and the second segmentation module inputs the third characteristic diagram into a second segmentation network to generate a second segmentation result.

An embodiment of a fourth aspect of the present application provides an electronic device, including a memory and a processor, the memory storing a computer program which, when executed by the processor, implements the image segmentation method and/or the training method of the image segmentation model as described in any of the embodiments above.

An embodiment of a fifth aspect of the present application provides a computer-readable storage medium, on which a computer program is stored which, when executed by a processor, implements the image segmentation method and/or the training method of an image segmentation model as described in any of the embodiments above.

On one hand, the embodiment of the application considers the spatial distribution relation of the segmented targets, utilizes a transducer to carry out high-dimensional feature coding, and enhances the segmentation effect through the spatial correlation characteristic of the segmented targets. On the other hand, considering the specific spatial position relation among different segmentation targets, generating a query vector through the characteristics of the first segmentation target, inputting the query vector into a transducer module, and guiding the encoding of the second segmentation target so as to realize the accurate segmentation and positioning of the second segmentation target.

Drawings

In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present description, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.

FIG. 1 is an application environment diagram of an image segmentation method according to an embodiment of the present application;

fig. 2 is a schematic flow chart of an image segmentation method according to an embodiment of the first aspect of the present application;

FIG. 3 is a schematic diagram of a segmentation model architecture of an image segmentation method according to an embodiment of the first aspect of the present application;

fig. 4 is a schematic structural diagram of a CNN convolutional neural network according to a first embodiment of the present application;

FIG. 5 is a schematic diagram of a split network according to an embodiment of the first aspect of the present application;

FIG. 6 is a schematic diagram of a query volume generation module according to an embodiment of the first aspect of the present application;

FIG. 7 is a schematic diagram of a network architecture of a transducer module according to an embodiment of the first aspect of the present application;

FIG. 8 is a flowchart of an image segmentation model training method according to a second embodiment of the present application;

FIG. 9 is a schematic illustration of a mask for labeling an image set according to an embodiment of the second aspect of the application;

fig. 10 is a schematic diagram of an image segmentation apparatus according to an embodiment of the third aspect of the present application;

fig. 11 is a schematic diagram of an electronic device architecture according to the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. It should be noted that embodiments and features of embodiments in the present disclosure may be combined, separated, interchanged, and/or rearranged with one another without conflict. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Furthermore, when the terms "comprises" and/or "comprising," and variations thereof, are used in the present specification, the presence of stated features, integers, steps, operations, elements, components, and/or groups thereof is described, but the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof is not precluded. It is also noted that, as used herein, the terms "substantially," "about," and other similar terms are used as approximation terms and not as degree terms, and as such, are used to explain the inherent deviations of measured, calculated, and/or provided values that would be recognized by one of ordinary skill in the art.

Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Computer Vision (CV) is a science of studying how to "look" a machine, and more specifically, to replace a human eye with a camera and a Computer to perform machine Vision such as recognition and measurement on a target, and further perform graphic processing to make the Computer process an image more suitable for human eye observation or transmission to an instrument for detection. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and others.

Image segmentation is a technique and process of dividing an image into several specific regions with unique properties and presenting objects of interest. It is a key step from image processing to image analysis. The existing image segmentation methods are mainly divided into the following categories: a threshold-based segmentation method, a region-based segmentation method, an edge-based segmentation method, a segmentation method based on a specific theory, and the like. From a mathematical perspective, image segmentation is the process of dividing a digital image into mutually disjoint regions. The process of image segmentation is also a labeling process, i.e. pixels belonging to the same region are given the same number.

To automate the image segmentation task, a multi-decoding multi-labeled segmentation model, such as a U-net network, is typically employed for similar segmentation tasks. However, for the specific task, the space correlation exists in the segmentation task targets actually, and the specific correlation of the segmentation targets and the specific space distribution mode of the targets are not considered, so that the method has room for improvement in the accurate segmentation and positioning effects of the segmentation targets.

An embodiment of the application provides an image segmentation method, which is based on an asymmetric multi-task transducer segmentation model, realizes full-automatic and multi-task segmentation target parallelization processing when an image is segmented, and improves the precision of segmentation tasks by utilizing the interrelationship between different segmentation targets and the self spatial distribution characteristics of a single segmentation target when the model is trained.

For ease of understanding, referring to fig. 1, fig. 1 is an application environment diagram of an image segmentation method according to an embodiment of the present application, and as shown in fig. 1, an image processing method according to an embodiment of the present application is applied to an image processing system. The image processing system includes: server and terminal equipment. The image processing apparatus may be disposed on a server or may be disposed on a terminal device, and the embodiment of the present application is described by taking the disposition on the server as an example, which should not be construed as limiting the present application. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, a content distribution network (Content Delivery Network, CDN), basic cloud computing services such as big data and an artificial intelligent platform. Terminals include, but are not limited to, cell phones, computers, intelligent voice interaction devices, intelligent appliances, vehicle terminals, aircraft, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and embodiments of the present application are not limited herein. The embodiment of the application can be applied to various medical image segmentation scenes, including but not limited to segmentation of vascular stents, brain tissue segmentation, tumor segmentation or polyp segmentation, and the like.

Firstly, a server acquires an image to be segmented, and performs feature extraction on the image to be segmented to acquire a first feature map; the server then inputs the first feature map into a first segmentation network to generate a first segmentation result and a second feature map; the server generates a query vector through the second feature map; meanwhile, the server inputs the first feature map into a transducer module to perform high-dimensional feature encoding so as to generate a third feature map, and meanwhile, the transducer module acquires the query vector to guide the high-dimensional feature decoding; the server then enters the third feature map into a second segmentation network to generate a second segmentation result.

The optical coherence tomography (optical coherence tomography, OCT) in the cardiovascular and cerebrovascular vessels is a novel cardiovascular and cerebrovascular imaging technology. Taking a cerebrovascular vessel as an example, the OCT imaging catheter moves forwards along the vessel, and performs fine imaging on the vessel wall with a certain thickness through near infrared imaging to assist diagnosis and treatment (plaque, hemangioma, bleeding and the like) of the cerebrovascular cavity, so that the OCT image in the cerebral vessel consists of a series of axial slices of the cerebrovascular cavity sequentially, and the middle area is a near-circular vessel cavity. In this embodiment, the image segmentation method of this embodiment will be described by taking the task of segmenting an OCT image of a cerebrovascular stent as an example, and this embodiment may also be referred to as a blood vessel and stent segmentation method based on the OCT image.

The image segmentation method provided by the embodiment of the first aspect of the present application will be described from the perspective of a server. Referring to fig. 2 and fig. 3, fig. 2 is a flow chart of an image segmentation method according to an embodiment of the first aspect of the present application, and fig. 3 is a schematic view of a segmentation model architecture of the image segmentation method according to the embodiment of the first aspect of the present application; an image segmentation method provided by an embodiment of a first aspect of the present application includes:

step one, obtaining an image to be segmented, and carrying out feature extraction on the image to be segmented to obtain a first feature map.

It should be understood that, in this embodiment, the segmentation task of the OCT image of the stent cerebral blood vessel is described, the image to be segmented is one of a series of OCT images of the stent cerebral blood vessel, the first segmentation target is a blood vessel lumen or a blood vessel wall in the image, the second segmentation target is a stent in the image, and the primary feature map of the image to be segmented is extracted, where the feature extraction is generally performed by using a CNN convolutional neural network.

Preferably, in some embodiments, the acquiring an image to be segmented, and extracting features of the image to be segmented to obtain a first feature map, includes:

the image to be segmented is input into a CNN convolutional neural network, as shown in fig. 4, and fig. 4 is a schematic structural diagram of the CNN convolutional neural network according to the first embodiment of the present application. The CNN convolutional neural network comprises an initialization module and M first residual modules, wherein the initialization module is used for extracting initial characteristics of an image to be segmented, the first residual modules comprise a pooling layer and a first convolution layer, the pooling layer is used for downsampling the output of the initialization module or the previous first residual modules, and the first convolution layer is used for carrying out convolution operation on the output of the pooling layer. In this embodiment, the value of M is 4.

Specifically, the convolutional neural network used by the application is used as a basic network skeleton, namely CNN backhaul, to perform low-level feature extraction. Firstly, extracting low-order visual features through an initialization module, inputting an original OCT image matrix into the initialization module, wherein the initialization module comprises three convolution layers, each convolution layer carries out convolution and batch normalization (Batch Normalization, BN) on the input image matrix, and adopts a linear rectification function (Rectified Linear Unit, reLU) to activate, outputs a feature matrix and inputs the feature matrix into the next convolution layer, and the extraction of the low-order visual features is completed after the three convolution layers to obtain initial features.

And then inputting the initial features into the feature extraction of depth in first residual modules after the initial modules, wherein each first residual module comprises a maximum pooling layer, two first convolution layers, a residual channel and a ReLU. And after four downsampling and convolution processes, finally outputting high-dimensional characteristics. The size of the input image matrix is 1x512x512 (CxHxW), after 4 residual convolution blocks, the feature size of the output image matrix is 512x32x32 (CxHxW), and the final image feature is provided for the subsequent modules to process. It is noted that the output of each first residual module is input into the subsequent first residual module on the one hand, and the fourth first residual module outputs a first profile of size 512x32x32 (CxHxW), and on the other hand also via a jump connection.

In this embodiment, the extraction of the first feature map is implemented by using a CNN convolutional neural network.

The first feature map is then input to the corresponding processing modules of step two and step four for processing, respectively. It should be understood that the second step and the fourth step are only steps to be distinguished, and are not sequential.

Inputting the first feature map into a first segmentation network to generate a first segmentation result and a second feature map.

In this embodiment, the stent and the vessel lumen are segmented, wherein the segmentation task of the vessel wall is relatively simple and easy, so that the vessel lumen is segmented directly through the first segmentation network according to the first feature map, and the generated first segmentation result, that is, the vessel lumen slice, is processed through the third step. Wherein the first segmentation network and the second segmentation network following stent segmentation are both segmentation networks for a specific target.

Preferably, in some embodiments, the first split network and the second split network in this embodiment have the same split network structure, and the specific split network structure is shown in fig. 5, and fig. 5 is a schematic view of the split network structure according to the first aspect of the present application. The split network architecture includes: m second residual error modules and an output module; the second residual error module comprises an up-sampling layer, a fusion layer and a second convolution layer, wherein the up-sampling layer is used for up-sampling the acquired characteristics, the fusion layer is used for carrying out characteristic fusion on the output of the up-sampling layer and the corresponding output of the first residual error module, and the second convolution layer is used for carrying out convolution operation on the output of the fusion layer. The output module includes a third convolution layer and a Sigmoid function.

In particular, the first segmentation network of the vessel lumen is also referred to as the first segmentation head shown in fig. 3, the second segmentation network of the stent is also referred to as the second segmentation head shown in fig. 3, the actual working of the second residual block is decoding the first feature map, and the second residual block may also be referred to as a decoding convolution block. The second residual module performs upsampling by using bilinear interpolation first, and performs feature fusion with the features from the first residual module, where the feature fusion should be that the size of the feature map output in CNN matches the size of the feature map output by the second residual module, for example, the first second residual module performs feature fusion with the output of the third first residual module after upsampling. Then processing through a 2-layer second convolution layer, simultaneously carrying out residual connection on the fusion characteristic and the outputs of the two second convolution layers, and then activating through a ReLU.

The dimension of the feature map is reduced after the feature map is decoded by the four second residual modules, the dimension is restored to the size of an input image, the feature map is input to a third convolution layer of an output module, the third convolution layer comprises a convolution layer with a convolution kernel size of 3x3 and a convolution layer with a convolution kernel size of 1x1, the probability of each pixel classification is output after the convolution is processed by a Sigmoid function, and a segmentation result is obtained after threshold processing.

In this embodiment, the precision of the segmentation result is further improved through multi-level decoding and encoding and feature fusion.

And thirdly, generating a query vector through the second feature map.

In some embodiments, preferably, the generating a query vector by the second feature map includes:

Specifically, in order to make the segmentation of the stent by the model more accurate, query generated by the lumen segmentation is used as a Query vector and input into a Transformer Decoder decoder to guide the model to more accurately locate and segment the region where the stent exists. In the processing procedure of the query vector generation module, as shown in fig. 6, fig. 6 is a schematic architecture diagram of the query vector generation module according to the embodiment of the first aspect of the present application. The output result of the inner cavity segmentation head is subjected to feature mapping through two-dimensional convolution with the convolution kernel size of 1, features in the channel dimension are obtained through global average pooling (Global Average Pooling, GAP), dimension expansion is performed according to the number of query vectors, noise disturbance is added, and the final query feature vector is input into a Transformer Decoder decoder to guide stent segmentation.

In this embodiment, by generating a query vector for guiding stent segmentation according to the feature map of the vessel lumen segmentation result, the accuracy of stent segmentation and positioning can be improved.

And step four, inputting the first feature map into a transducer module for high-dimensional feature encoding so as to generate a third feature map, and simultaneously, acquiring the query vector by the transducer module to guide the high-dimensional feature decoding.

Preferably, in some embodiments, as shown in fig. 7, fig. 7 is a schematic diagram of a network architecture of a transducer module according to an embodiment of the first aspect of the present application. The transducer module comprises a feature mapping unit, a transducer encoder, a transducer decoder and a multi-head attention module;

the inputting the first feature map into a transducer module for high-dimensional feature encoding to generate a third feature map, and the transducer module obtaining the query vector to guide the high-dimensional encoding includes:

the first feature code and the second feature code are acquired by the multi-headed attention module to generate a high-dimensional feature;

and generating the third characteristic diagram through deconvolution according to the high-dimensional characteristics.

Specifically, as shown in fig. 7, fig. 7 is a schematic diagram of a architecture of a transducer codec module according to a first embodiment of the present application. The basic composition of the Transformer Encoder Decoder module is 3 Transformer Encoder and 2 Tranformer Decoder. In the process of high-dimensional coding, firstly, feature images extracted from a CNN convolutional neural network are converted into image feature vectors through feature mapping, the image feature vectors are input into a Transformer Encoder module for higher-dimensional coding after feature position coding is added, the coded result is input into a Transformer Decoder module for decoding, meanwhile, the Transformer Decoder module receives query vectors generated by an inner cavity segmentation result as segmentation guidance, and finally, the Transformer Decoder decoder output and the Transformer Encoder module output are subjected to multi-head attention so as to generate high-dimensional feature coding, namely high-dimensional features. Multi-head attention is calculated in parallel through a plurality of independent attitudes as an integrated effect, which is beneficial to preventing overfitting.

Preferably, in some embodiments, the generating the high-dimensional feature further includes:

In this embodiment, the transform is used to perform high-dimensional feature encoding, and the segmentation effect is enhanced by the spatial correlation characteristic of the segmentation target itself. Meanwhile, the generated high-dimensional features and low-dimensional features are fused and then segmented, so that the segmentation accuracy can be further improved.

And fifthly, inputting the third characteristic diagram into a second segmentation network to generate a second segmentation result.

The structure of the second split network is similar to that of the first split network, and will not be described here again.

An embodiment of the second aspect of the present application provides an image segmentation model training method, as shown in fig. 8, including:

extracting features of the training image to obtain a first feature map;

generating a query vector through the second feature map;

Specifically, the training dataset, that is, the OCT cerebrovascular dataset in this embodiment is manually labeled, and after the labeling is completed and verified, the training dataset is made into a dataset for the deep learning algorithm, the labeling software uses the open source medical labeling software ITK-SNAP, and the labeling of the cerebrovascular lumen and the stent is performed by using a painting tool (Paint Brush) in the labeling tool. The labeling samples total 15 cases, namely 15 cases of original medical digital imaging and communications (Digital Imaging and Communications in Medicine, DICOM) files are used in total. The labeled objects are a blood vessel inner cavity and a bracket, and in order to solve the problem of overlapping of the areas of the blood vessel inner cavity and the bracket, the two targets are labeled independently. The artificial labeling results of the cerebral vessel lumen and the stent are shown in fig. 9, and the areas pointed by arrows are the mask of the vessel lumen, namely the first label, and the mask of the stent, namely the second label.

In the present embodiment, when creating an image set, an OCT original image and a label are created as a data set in units of DICOM samples. The dataset amounted to 15 samples and the final dataset produced amounted to 2458 images. Randomly selecting 11 samples to train, wherein the total number of the samples is 1855, and the ratio is 75%; randomly selecting 2 samples for verification during training, wherein the total 373 pictures account for 15 percent; the rest 2 samples are taken as independent test sets, and the total number of the samples is 230, and the ratio is 10%; during the training process, successive image slices in a single DICOM sample are placed into the network for training and testing.

For the structure of the model, see the relevant description of the embodiments of the first aspect, which is not repeated here.

Preferably, the loss of the network model, including the first loss value and the second loss value, is determined by a combination of cross entropy of two classifications (Binary CrossEntropy Loss, BCELoss) and tersky, wherein BCELoss is used for classification of pixels and tersky is used to balance the equalization of positive and negative samples.

In particular, the method comprises the steps of,wherein y is _i Is a tag, p _i Is the predicted value of the model, and N is the number of total pixels.

Wherein y is _i Is a tag, p _i Is the predicted value of the model, and N is the number of total pixels. Alpha and beta are the corresponding weights.

The first loss value or the second loss value may be expressed as:

loss=λbceloss+ (1λ) tverskiloss, where λ is the weight parameter.

In this embodiment, the combined optimization training is performed after the first loss value, which is the loss of the vessel lumen segmentation, and the second loss value, which is the loss of the stent segmentation, are added, the model uses a random gradient descent algorithm to perform optimization of network parameters, and the training model uses an Adam optimizer with faster convergence to perform rapid optimization of parameters of the model.

An image segmentation apparatus according to an embodiment of the third aspect of the present application, as shown in fig. 10, includes:

Computer-readable storage media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, read only compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by the computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of function in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the application, and is not meant to limit the scope of the application, but to limit the application to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the application are intended to be included within the scope of the application.

Claims

1. An image segmentation method, characterized by comprising the steps of:

acquiring an image to be segmented, and carrying out feature extraction on the image to be segmented to acquire a first feature map;

generating a query vector through the second feature map;

2. The image segmentation method as set forth in claim 1, wherein: the transducer module comprises a feature mapping unit, a transducer encoder, a transducer decoder and a multi-head attention module;

3. The image segmentation method as set forth in claim 2, wherein: the generating the high-dimensional feature further comprises:

4. The image segmentation method as set forth in claim 1, wherein: the obtaining the image to be segmented, and extracting the features of the image to be segmented to obtain a first feature map includes:

5. The image segmentation method as set forth in claim 1 or 4, characterized in that: the first and second split networks have the same split network structure, the split network structure comprising:

m second residual error modules and an output module;

the output module includes a third convolution layer and a Sigmoid function.

6. The image segmentation method as set forth in claim 1, wherein: the generating the query vector through the second feature map includes:

7. The training method of the image segmentation model is characterized by comprising the following steps of:

extracting features of the training image to obtain a first feature map;

generating a query vector through the second feature map;

8. An image dividing apparatus, comprising:

9. An electronic device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, implements the image segmentation method of any one of claims 1-6 and/or the training method of the image segmentation model of claim 7.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, implements the image segmentation method according to any one of claims 1-6 and/or the training method of the image segmentation model according to claim 7.