CN117876690A - Ultrasonic image multi-tissue segmentation method and system based on heterogeneous UNet - Google Patents

Ultrasonic image multi-tissue segmentation method and system based on heterogeneous UNet Download PDF

Info

Publication number
CN117876690A
CN117876690A CN202410092660.0A CN202410092660A CN117876690A CN 117876690 A CN117876690 A CN 117876690A CN 202410092660 A CN202410092660 A CN 202410092660A CN 117876690 A CN117876690 A CN 117876690A
Authority
CN
China
Prior art keywords
module
convolution
segmentation
layer
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410092660.0A
Other languages
Chinese (zh)
Inventor
李发琪
高凡童
李成海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Medical University
Original Assignee
Chongqing Medical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Medical University filed Critical Chongqing Medical University
Priority to CN202410092660.0A priority Critical patent/CN117876690A/en
Publication of CN117876690A publication Critical patent/CN117876690A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10132Ultrasound image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Radiology & Medical Imaging (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an ultrasonic image multi-organization segmentation method and system based on heterogeneous UNet, wherein the method utilizes an improved U-net network, and the convergence rate is increased when residual connection is integrated into the method to facilitate training; the improved space pyramid ASPP module is combined, the picture receptive field is increased, and the image context information is better reserved; the addition of the Attention Gate module is helpful for the model to learn to restrain the interference of irrelevant areas or noise in the image during training, pays Attention to extracting valuable remarkable characteristics in the target area, and improves the overall accuracy of the segmentation model; the improved U-net network can be well applied to target segmentation tasks, the segmentation effect is better than that of the existing segmentation network, and the improved U-net network has good market prospect; compared with a semi-automatic segmentation method of manual sketching and man-machine interaction, the method greatly reduces the time required by segmentation, effectively improves the accuracy and efficiency of ultrasound image segmentation, and has important clinical and scientific research application values.

Description

Ultrasonic image multi-tissue segmentation method and system based on heterogeneous UNet
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an ultrasonic image multi-tissue segmentation method and system based on heterogeneous UNet.
Background
The ultrasonic segmentation algorithm plays a key role in medical image processing, and has the main purpose of segmenting structures or tissues in an ultrasonic image into different areas so as to perform further analysis, diagnosis or treatment planning, and has important application values in the aspects of accurate diagnosis, personalized treatment planning, surgical navigation, standardized analysis, research education and the like. Because manual segmentation of ultrasound images is a time-consuming and laborious task, the defects of strong subjective dependence, high error rate, high time cost, poor repeatability and the like exist, and the defects limit the application of the ultrasound images in clinical and scientific research to a certain extent. To effectively solve these problems, a large number of ultrasound image automatic segmentation methods have been proposed in recent years for improving the accuracy and efficiency of ultrasound image segmentation. The current automatic ultrasound image segmentation method is mainly divided into a traditional ultrasound image segmentation method and a segmentation method based on deep learning.
Traditional ultrasound image segmentation methods are based on traditional image processing technologies, such as threshold segmentation, region growing, edge detection and the like, are simple but depend on manual feature design, are sensitive to noise and artifacts, are difficult to adapt to ultrasound image non-uniformity, are easy to generate over-segmentation or under-segmentation, only consider the relation between local pixels, and lack of guidance of global context information may lead to poor consistency and continuity of segmentation results for the whole structure. In addition, many algorithms in the conventional method need to rely on setting some parameters, and the selection of parameters may affect the accuracy and stability of the segmentation result, which requires a certain expertise and experience of the user. Therefore, the traditional ultrasonic image segmentation method has certain limitation in processing the ultrasonic image with complex noise and more artifacts, and is difficult to meet the requirements of instantaneity, high accuracy and stability. With the development of deep learning and other technologies, an ultrasonic image automatic segmentation method based on deep learning is becoming a mainstream gradually, and can effectively overcome the defects of the traditional method. Medical image segmentation models based on deep learning are largely divided into segmentation models based on convolutional neural networks (e.g. FCN, deep lab), on encoder-decoder structures (e.g. U-Net, segNet), on Attention mechanisms (e.g. Attention U-Net, SENet), on generating countermeasure networks (e.g. pix2pix, cycleGAN), on cascade networks (e.g. DeepMedic, cascadeNet).
However, since ultrasound images are often affected by noise and artifacts, these factors can lead to reduced image quality and blurred structural information, making the ultrasound image edges and texture features unclear; factors such as imaging conditions, tissue characteristics and the like in the ultrasonic image acquisition process, the brightness and the contrast of the image may have non-uniformity, and the non-uniformity may cause areas with light and shade differences and discontinuous gray scales in the image; tissue and organ structures in ultrasound medical images are often complex and diverse, including structures of different shapes, sizes, and textures, between which there may be interconnections or partial occlusions. The above situations all increase the difficulty of ultrasound medical image segmentation, and the existing ultrasound image segmentation method focuses on single-target segmentation, namely only focuses on segmentation of single target tissues, such as breast, prostate and the like, ignores the application value of other tissue parts of the B-ultrasonic image, and in summary, realizing accurate and rapid segmentation of the multi-tissue parts in the B-ultrasonic image is still a challenge.
Disclosure of Invention
In view of the above, the present invention is directed to a heterogeneous UNet-based method and system for multi-tissue segmentation of ultrasound images, which uses an improved U-Net network for target segmentation and is suitable for multi-tissue segmentation in ultrasound images.
In order to achieve the above purpose, the present invention provides the following technical solutions:
the invention provides an ultrasonic image multi-tissue segmentation method based on heterogeneous UNet, which comprises the following steps:
(1) And (3) data acquisition: collecting a B ultrasonic image for segmentation, wherein the B ultrasonic image contains each target tissue part to be segmented, and an ultrasonic image data set is established;
(2) And (3) data marking: sequentially setting each tissue label for each picture data in the ultrasonic image data set;
(3) Preprocessing a data set: firstly, carrying out normalization and standardization treatment on all data in a data set;
(4) Constructing a segmentation network: the segmentation network adopts an improved U-Net network, and the improved U-Net network is used for identifying each organization and target area image; the improved U-Net network comprises an encoder, a decoder, an attention gating module and an improved ASPP module, wherein the encoder respectively obtains deep characteristic information and shallow characteristic information by downsampling through a double convolution residual error characteristic extraction module; gradually increasing resolution through up-sampling operation of a decoder and dividing images by utilizing deep layer characteristic information and shallow layer characteristic information; the attention gating module is added before each decoding layer of the network decoder and is used for enhancing the attention of the network to specific characteristics; the improved ASPP module is positioned at the tail end of the encoder and is used for capturing the space information under different void ratios so as to effectively process targets with different scales;
(5) Training network: training the segmentation network by using a parameter initialization method to obtain a trained segmentation network;
(6) Recognition result: and inputting the acquired ultrasonic image data into a trained segmentation network for recognition to obtain the segmentation result of each tissue and target area on the acoustic channel in the ultrasonic image data.
Further, the improved U-Net network is implemented according to the following steps:
a) Extracting shallow layer characteristic information by a double convolution residual error characteristic extraction module at the front part of the encoder; deep feature information is extracted by a double convolution residual feature extraction module at the rear part of the encoder;
b) After shallow layer features and deep layer features are obtained, performing enhanced feature extraction operation on the deep layer features through a space pyramid with different expansion rates in a decoder; carrying out convolution processing on shallow features, carrying out feature fusion on the deep and shallow features subjected to operation processing, carrying out convolution operation on the fused features, and inputting the convolutions into a space pyramid ASPP;
c) Multiplying the input feature map output by the attention gating module by elements of an attention coefficient, capturing context information by using a feature map of the coarse granularity of an image obtained by a convolutional neural network, and fusing shallow fine granularity features with deep coarse granularity features by the attention gating module to obtain the category and the position of a target area in the image;
d) And (3) carrying out upsampling by bilinear interpolation, and predicting the image characteristics after convolution pooling operation to obtain pixel level division of each organization on the acoustic channel.
Further, the double convolution residual feature extraction module performs the following steps:
the output features and the input features of the first convolution layer of the double convolution residual feature extraction module form first residual connection;
and the output characteristics of the second convolution layer of the double convolution residual characteristic extraction module are added with the output characteristics of the first convolution layer to form second residual connection, and finally, a second residual connection result is output.
Further, the striping pool SPM module proceeds as follows:
respectively pooling the input feature map by horizontal stripes and vertical stripes;
after convolution and expansion processing, summing corresponding to the same position to obtain a characteristic diagram of H multiplied by W;
and multiplying the 1 multiplied by 1 convolution and sigmoid processing with the corresponding pixel of the original input graph to obtain an output result.
Further, the last layer of the encoder is connected with the corresponding layer of the decoder through an improved ASPP module, and the improved ASPP module convolves by adopting hole convolution kernels with different expansion rates and fuses the striping pooling SPM module.
The invention provides an ultrasonic image multi-organization segmentation system based on heterogeneous UNet, which comprises an input end, an encoder, an attention gating module, an improved ASPP module, a decoder and an output end, wherein the input end is connected with the encoder;
the encoder comprises a plurality of double convolution residual error characteristic extraction modules;
the decoder comprises a plurality of up-sampling convolution layers;
the encoder performs downsampling through a double convolution residual feature extraction module to respectively obtain deep feature information and shallow feature information; gradually increasing resolution through up-sampling operation of a decoder and dividing images by utilizing the deep layer characteristic information and the shallow layer characteristic information;
the input end is used for inputting the B ultrasonic image into the double convolution residual error characteristic extraction module;
the double convolution residual feature extraction module is used for extracting features through residual connection of two convolution layers;
the attention gating module is arranged in jump connection of the corresponding layer and is used for setting the weight of the feature;
the improved ASPP module is arranged between an encoder and a decoder in the U-Net structure and is used for carrying out feature processing on shallow features and deep features;
the output end is used for outputting data processed by a decoder in the U-Net structure.
Further, the double convolution residual feature extraction module comprises a first convolution layer and a second convolution layer; the output features and the input features of the first convolution layer form a first residual connection; the output characteristics of the second convolution layer are added to the output characteristics of the first convolution layer to form a second residual connection, and the final output is the result of the second residual connection.
Further, the improved ASPP module comprises a plurality of parallel convolution operation modules, a banded SPM module, a connection operation module and an output module;
the parallel convolution operation module is used for carrying out convolution operation under different sampling rates, so that context information of images is captured on different scales, the model is facilitated to better understand local details and global structures in the images, and the perceptibility of the model to targets of different scales is improved;
the striping SPM module captures remote context by focusing on two paths along two dimensions of horizontal and vertical space, respectively, such that each location in the output tensor establishes a correspondence with various locations in the input tensor;
the connecting operation module is used for combining the characteristic images with different dimensions and capturing context information under different spatial dimensions in the image;
and the output module is used for outputting and connecting the data obtained by the operation module.
Further, the striped pooling SPM module comprises a vertical striped pooling layer, a horizontal striped pooling layer, a one-dimensional convolution, a fusion module and an activation function;
the vertical bar pooling layer is used for encoding the remote context along the vertical space dimension;
the horizontal bar pooling layer is used for encoding the remote context along the horizontal space dimension;
the one-dimensional convolution is used for dimension expansion and modulating the current position and adjacent characteristics thereof;
the fusion module is used for obtaining output containing more useful global priori;
the activation function is used for mapping the output of the network between 0 and 1.
The invention has the beneficial effects that:
the invention discloses an ultrasonic image multi-organization segmentation method and system based on heterogeneous UNet, wherein the method utilizes an improved U-net network, and the convergence rate is increased when residual connection is integrated into the method to facilitate training; the improved space pyramid ASPP module is combined, the picture receptive field is increased, and the image context information is better reserved; the addition of the Attention Gate module is helpful for the model to learn to restrain the interference of irrelevant areas or noise in the image during training, pay Attention to extracting valuable remarkable characteristics in the target area, and improve the overall accuracy of the segmentation model. The improved U-net network can be well applied to target segmentation tasks, has better segmentation effect than the existing segmentation network, and has good market prospect. In the aspect of accurate diagnosis and treatment planning, the method can help to extract the region of interest in the image, so that lesion analysis can be performed more accurately, and doctors can be assisted to make more accurate diagnosis and treatment planning. In the aspect of clinical treatment surgery navigation, ultrasonic guidance is widely used for various types of surgeries, such as percutaneous puncture and biopsy, focused ultrasonic ablation surgery, nerve block, radio frequency ablation and other clinical application scenes, provides real-time, visual and accurate image guidance and monitoring, is necessary for rapidly and accurately dividing each tissue on an ultrasonic image, and can assist doctors to better perform the surgeries and ensure the success and safety of the surgeries. In a standardized analysis application scenario, the method is helpful for realizing the standardization of image analysis. Through automated processing, the consistency of analysis can be improved, human errors can be reduced, and results between different doctors or medical institutions can be more comparable. The method is also beneficial to realizing personalized medicine, and the automatic segmentation is beneficial to better understand individual differences of patients, so that support is provided for personalized medicine. Through the accurate segmentation of the ultrasound image, the anatomy structure and lesion distribution of the patient can be better understood, and data support is provided for the personalized treatment scheme. In the research and education fields, the method provides a powerful tool for scientific research, and is helpful for in-depth research of disease characteristics and development of treatment methods. In addition, the method can also be used for teaching of ultrasonic imaging subjects and helps to cultivate new generation ultrasonic imaging professional talents. In conclusion, the segmentation method has important scientific research and clinical application values.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objects and other advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the specification.
Drawings
In order to make the objects, technical solutions and advantageous effects of the present invention more clear, the present invention provides the following drawings for description:
FIG. 1 is a diagram of the overall network architecture;
FIG. 2 is a schematic block diagram of an improved U-Net network;
FIG. 3 is an overall block diagram of a dual convolution residual feature extraction module;
FIG. 4 is an overall block diagram of an attention gating module;
FIG. 5 is an overall block diagram of a striping pooling module;
FIG. 6 is an overall block diagram of a modified ASPP module;
fig. 7 is a schematic diagram showing contrast of the segmentation effect of the abdominal B-mode image.
Detailed Description
The present invention will be further described with reference to the accompanying drawings and specific examples, which are not intended to limit the invention, so that those skilled in the art may better understand the invention and practice it.
The heterogeneous UNet-based ultrasonic image multi-tissue segmentation method provided by the embodiment can realize automatic, rapid and accurate segmentation of each tissue on an acoustic channel in a clinical treatment scene, and comprises the following steps:
(1) And (3) data acquisition: collecting a B ultrasonic image for segmentation, wherein the image needs to cover each target tissue part to be segmented, and unifying the image size to 512pt by cutting and resampling the image, so as to establish an ultrasonic image data set;
(2) And (3) data marking: the method comprises the steps of performing ROI sketching on each piece of picture data in a data set, and assuming that each sound channel tissue from skin to a treatment area is totally n types, sequentially setting each tissue label to be 1, 2 and 3 … n, setting a target area label to be n+1, and after all sketching on the data, converting a label corresponding to an original B ultrasonic image into a mask image in a png format, wherein the mask image is a single-channel image, and the size of the mask image is 512pt 1 and the size of the original B ultrasonic image are kept consistent;
(3) Preprocessing a data set: firstly, carrying out normalization processing on all data in a data set, and eliminating adverse effects caused by singular sample data; and secondly, carrying out standardization processing on the data so as to accelerate the convergence rate of the model. For the problem of small training sample number, adopting a data enhancement method of horizontally turning over a data set, increasing contrast, changing brightness and the like;
(4) Constructing a segmentation network: the segmentation network adopts an improved U-Net network to carry out semantic segmentation on an input image and marks each pixel in the image as different categories, and the improved U-Net network comprises an input end, an encoder, a decoder, an attention gating module, an improved ASPP module and an output end;
the encoder comprises a plurality of double convolution residual error characteristic extraction modules; the method comprises the steps of mapping an input image to an advanced semantic feature space, extracting information of different layers of the image through a hierarchical structure, and reducing image resolution through downsampling operation;
the decoder comprises a plurality of up-sampling convolution layers; gradually restoring the resolution of the feature map and fusing information from different levels, and organically combining high-level semantic information and low-level detail information extracted from the encoder to generate an accurate prediction result;
the encoder performs downsampling through a double convolution residual feature extraction module to respectively obtain deep feature information and shallow feature information; gradually increasing resolution through up-sampling operation of a decoder and dividing images by utilizing the deep layer characteristic information and the shallow layer characteristic information;
the input end is used for loading a training data set, converting the channel number, the height and the width information of the preprocessed image and the preprocessed label into tensor format, and organizing the tensor format into batches to be input into a network for training;
the double convolution residual feature extraction module is used for effectively extracting and retaining important features from the input feature map; the module can accelerate convergence speed during training;
the attention gating module is used for dynamically adjusting weights of different positions in the feature map; the method has the advantages that the capability of learning to restrain irrelevant areas or noise interference in an image is improved when the model is trained, valuable obvious features in a target area are extracted, context information is captured by using a feature map of the coarse granularity of the image obtained by a convolutional neural network, shallow fine granularity features are fused with deep coarse granularity features, the category and the position of the target area in the image can be highlighted, and the overall accuracy of a segmentation model is improved;
the shallow layer characteristics of the image in the embodiment refer to that the characteristics extracted by a shallow layer network are relatively close to the input, and the shallow layer characteristics comprise more pixel information, and some fine granularity information is some color, texture, edge and edge angle information of the image; the deep features of the image refer to features extracted by a deep network, which are closer to output, and some coarse-grained information, including more abstract information, namely semantic information.
The improved ASPP module adopts a plurality of parallel cavity convolution branches, each branch has different sampling rates, and feature extraction under different scales is realized; through multi-scale feature extraction and increased receptive field, the understanding of the network to the whole information of the image is improved, so that the perception and segmentation performance of the segmentation model to the target are enhanced, and the segmentation model can be better adapted to different scenes and complex image contents; by introducing an SPM module into the original ASPP module, the capability of capturing long-distance dependence is improved, and the detection effect of the long-strip object is enhanced;
the decoder comprises a plurality of up-sampling convolution layers, wherein each up-sampling convolution layer is respectively connected with the double convolution residual error characteristic extraction module in a jumping manner through a corresponding attention gating module, and is used for obtaining image characteristics through up-sampling, gradually amplifying a low resolution characteristic image in the encoder, recovering the resolution of an original input image and helping gradually restoring details of the image;
the output end is used for outputting a predicted image with the same size as the input image, wherein each pixel value in the predicted image corresponds to the category label of the position.
As shown in fig. 1 and fig. 2, fig. 1 is a network overall structure diagram, fig. 2 is a schematic block diagram of an improved U-Net network, and the present segmentation network integrates a double convolution residual feature extraction module, an attention gate module and an improved ASPP module in a U-Net structure, wherein the double convolution residual feature extraction module is used for replacing a double convolution module in a traditional U-Net network;
connecting the improved ASPP module after the fourth layer double convolution residual module;
an Attention Gate (AG) is added to a skip connection (skip connection), and the feature map of the decoding part and the feature map of the encoding part of the upper layer are used as inputs of the Attention Gate (AG), and after passing through the Attention Gate (AG), the result and the up-sampled feature map of the decoding part are connected.
As shown in fig. 3, fig. 3 is an overall structure diagram of a dual convolution residual feature extraction module, and the dual convolution residual feature extraction module provided in this embodiment uses features of a previous layer as input and transmits the features to the dual convolution residual feature extraction module. In this module, there are two convolutional layers, each comprising operations of a convolutional layer, a linear rectifying unit (Linear rectification function, reLU), and a bulk normalization layer. First, a convolution operation is carried out, and a convolution layer with the convolution kernel size of 3x3 is used for carrying out convolution operation on input features and capturing local space information of the input features. After the convolution operation, batch normalization operation is carried out on the output characteristics, and distribution of input data is normalized, so that network convergence is accelerated, and network generalization capability is improved. And then transmitting the output to a ReLU activation function to perform nonlinear transformation, and introducing nonlinear factors to increase the expression capacity of the model.
Residual connection is performed on the output features and the input features of the first convolution layer to form a first residual connection, namely, the original input features and the output features of the last layer are added.
The residual connection mode can help the network to better transfer gradient and learn residual errors, and helps to solve the problem of gradient disappearance in the network training process.
The second convolution layer is similar in structure to the first convolution layer and also contains standard convolution operations such as convolution kernels, activation functions, batch normalization, and the like.
The output characteristics of the second convolution layer are added to the output characteristics of the first convolution layer to form a second residual connection, and the final output is the result of the second residual connection.
Extracting shallow characteristic information through convolution of a first layer and a second layer in a network; extracting deep characteristic information after one-layer to four-layer convolution operation;
as shown in FIG. 4, FIG. 4 is an overall structure diagram of the attention gating module, where g and x l The output of the skip connection and the output of the next layer, respectively. The output of the attention gating module is the multiplication of the input feature map and the elements of the attention coefficient, and the expression is shown in the formula (1):
(1)
Wherein,representing intermediate values in the calculation of the attention weighting;
representing an attention coefficient;
an output feature map representing a corresponding layer of the decoder;
gi represents the output profile of the corresponding layer of the encoder.
Wg, wx and ψ each represent a convolution operation;
σ 1 representing a ReLu activation function;
σ 2 representing a Sigmoid activation function;
bg and bψ are bias terms for the corresponding convolutions;
θ att representing a set of parameter sets.
As shown in fig. 5, fig. 5 is an overall block diagram of a Striping Pool (SPM) module, and the input signature is respectively striped horizontally and vertically and then sized to h×1 and 1×w. And then carrying out convolution and expansion on the 1D convolution with the convolution kernel of 3, and summing corresponding positions to obtain the characteristic diagram of H multiplied by W. And then, multiplying the pixel corresponding to the original input graph by the convolution of 1 multiplied by 1 and sigmoid processing to obtain an output result.
The strip pooling SPM module comprises a vertical strip pooling layer, a horizontal strip pooling layer, a one-dimensional convolution, a fusion module and an activation function;
the vertical bar pooling layer is used for encoding the remote context along the vertical space dimension;
the horizontal bar pooling layer is used for encoding the remote context along the horizontal space dimension;
the one-dimensional convolution is used for dimension expansion and modulating the current position and adjacent characteristics thereof;
the fusion module is used for obtaining output containing more useful global priori;
the activation function is used for mapping the output of the network between 0 and 1;
as shown in fig. 6, fig. 6 is an overall structure diagram of an improved ASPP module, which includes a plurality of parallel convolution operation modules, a striped pooling SPM module, a connection operation module, and an output module;
the parallel convolution operation module is used for carrying out convolution operation under different sampling rates, so that context information of images is captured on different scales, the model is facilitated to better understand local details and global structures in the images, and the perceptibility of the model to targets of different scales is improved;
the striping SPM module captures remote context by focusing on two paths along two dimensions of horizontal and vertical space, respectively, such that each location in the output tensor establishes a correspondence with various locations in the input tensor;
the connecting operation module is used for combining the characteristic images with different dimensions and capturing context information under different spatial dimensions in the image;
and the output module is used for outputting and connecting the data obtained by the operation module.
The parallel convolution operation module in the embodiment consists of six parallel branches, and convolves the input feature map with void ratios (6, 12 and 18 respectively) of different sizes to increase the size of the receptive field, thereby capturing feature information of different scales; the striping pooling SPM module is introduced into the original branch of the ASPP module to serve as a new branch, so that the capability of capturing remote dependency relationship of the network is improved. Then fusing the results obtained by each branch together through concat operation, and expanding the number of channels; finally, the number of channels is reduced to a desired value through convolution of 1×1, and then the result is output.
Because the proportion of tissue areas such as skin, fat muscle and the like in the abdominal ultrasonic image is small, and the skin is closely adjacent to the fat layer and the fat and muscle layer areas. Aiming at the problems of unbalanced class and inaccurate segmentation of the organization outline morphology on the acoustic channel caused by excessive background pixels, residual connection of ResNet is integrated in a U-Net framework, gradient can directly flow through a shorter path, the problem of gradient disappearance in the training process is reduced, and the residual connection is beneficial to faster network convergence and improves training efficiency; the double convolution module in the U-Net network is replaced by the double residual convolution module, so that the accuracy reaches a saturated shallow layer network, a plurality of identical mapping layers are added behind the shallow layer network, the depth of the network is increased, and meanwhile, errors are not increased. This allows the number of layers of the neural network to exceed the previous constraints, improving accuracy.
The strip pooling (Strip Pooling Module, SPM) module is integrated into the global space pyramid pooling (Atrous Spatial Pyramid Pooling, ASPP) module, so that long-distance dependence can be captured while the receptive field is effectively enlarged, and the detection effect of a long-strip object is enhanced; the striping in this embodiment enables the semantic segmentation network to aggregate global and local context information simultaneously by deploying long striping pooling core shapes along the spatial dimension.
Performing multi-scale feature extraction for the output of the encoder to obtain multi-scale context information; the attention gating module is introduced between jump connection of the encoder and the decoder, different weights are given to different areas in the feature map, so that the model can be more focused on the interested part, and the recognition and positioning capability of the model to various tissues and target areas is improved, and the specific flow is as follows:
(a) Constructing a U-shaped network model: the split network model provided by the method uses U-Net as a backbone network, and a network backbone consists of an encoder and a decoder. The network structure of the encoder and decoder is divided into four layers from top to bottom. Wherein each of the first three layers of the encoder section is connected to a corresponding layer of the decoder section by a jump connection.
(b) Introducing a double convolution residual error characteristic extraction module into a U-Net backbone network: and replacing the traditional double convolution module in the original U-Net network with a double convolution residual error characteristic extraction module.
(c) Introducing an improved global spatial pyramid pool module into a U-Net backbone network: the fourth layer at the encoder section is connected to the fourth layer at the decoder by a modified global spatial pyramid pool module. The improved spatial pyramid ASPP adopts the cavity convolution kernels with different expansion rates to carry out convolution, specifically selects the cavity convolution kernels with rate=6, rate=12 and rate=18, fuses the strip pooling SPM module, can increase the picture receptive field, and better retains the image context information.
(d) Attention gating modules are introduced in the U-Net backbone network: the attention gating module is added to the skip connection (skip connection) of the corresponding layers of the network encoder and decoder.
(e) Finally, bilinear interpolation up-sampling is adopted, and image characteristics are predicted after convolution pooling and other operations, so that pixel level division of each organization on the acoustic channel is realized.
(5) Training network: the segmentation network is trained according to the following steps:
initializing a network by using a parameter initialization method, updating the weight of the network by using an adam optimizer, adaptively adjusting the learning rate and momentum, accelerating the network convergence and reducing the training time.
And setting proper network super parameters such as learning rate, batch size, iteration turns, optimizer weight attenuation, momentum and the like.
The image segmentation task is mainly to judge whether each pixel point is a target pixel point or not, the overall proportion of various segmentation targets of the ultrasonic image to the image is small, and the multi-scale difference of the image prospect is large.
If cross entropy loss is used to determine the foreground of an image, the loss of pixels may be diluted by the loss of background, while it only considers the loss results of the individual pixels themselves, treating each element equally and ignoring the structural nature of the image as a whole. The pixels which are not used should have different weights, and the problem of large multi-scale variability of the image object should be paid attention to the variability loss of the whole image segmentation object.
Therefore, a loss function for evaluating the boundary segmentation effect is designed by using the weighted cross entropy (Weighted Binary Cross Entropy, WBCE) loss and the Structural similarity (Structural SIMilarity, SSIM) loss, and the defined loss function L is shown in the following formula (2):
(2)
Wherein:representing a loss function; />Representing adjustable weight, wherein the value is 0-1; />Representing weighted cross entropy loss;representing a structural similarity loss;
the super parameter alpha is a balance coefficient and is used for balancing the influence of detail loss and contour loss on a final structure;
weighted cross entropy loss L WBCE Loss of structural similarity L SSIM Defined as weighted cross entropy loss and structural similarity loss, respectively, as shown in the formulas (3) and (4), the weighted cross entropy is used to evaluate the importance of pixels by assigning a weight to each pixelThe important pixels are distributed with larger weights, the simple pixels are distributed with smaller weights, and the influence factors of the pixels on the surrounding environment are calculated according to the difference between the central pixels and the surrounding environment;
(3)
(4)
Wherein N represents the total number of sample categories;a weight value representing an i-th category; />Representing a one-hot vector; />A predicted value representing an i-th category; x and y respectively represent the length and width of the window for calculation; />Represents the average value of x; />Represents the average value of y; />Representing the variance of x; />Representing the variance of y; />Representing the covariance of x and y; c1, c2 are two variables that remain stable.
(6) Recognition result: and inputting the acquired ultrasonic image data into a trained segmentation network for recognition to obtain the segmentation result of each tissue and target area on the acoustic channel in the ultrasonic image data. Taking an abdomen B-ultrasonic image as an example, dividing target organs into skin, fat and muscle tissues respectively.
As shown in FIG. 7, the comparison of the segmentation results on the abdominal B-ultrasound dataset is shown in the figure, with each row corresponding to a different patient selected. The first column is the original image, the second column is the real label image, and the third column is the segmentation result of the present example network. The white area below the middle shows: the skin, fat and muscle tissues are respectively arranged from bottom to top, and the tissue boundaries on each acoustic channel are clear and accurate after being segmented by the method.
The above-described embodiments are merely preferred embodiments for fully explaining the present invention, and the scope of the present invention is not limited thereto. Equivalent substitutions and modifications will occur to those skilled in the art based on the present invention, and are intended to be within the scope of the present invention. The protection scope of the invention is subject to the claims.

Claims (9)

1. The ultrasonic image multi-tissue segmentation method based on heterogeneous UNet is characterized by comprising the following steps of: the method comprises the following steps:
(1) And (3) data acquisition: collecting a B ultrasonic image for segmentation, wherein the B ultrasonic image contains each target tissue part to be segmented, and an ultrasonic image data set is established;
(2) And (3) data marking: sequentially setting each tissue label for each picture data in the ultrasonic image data set;
(3) Preprocessing a data set: firstly, carrying out normalization and standardization treatment on all data in a data set;
(4) Constructing a segmentation network: the segmentation network adopts an improved U-Net network, and the improved U-Net network is used for identifying each organization and target area image; the improved U-Net network comprises an encoder, a decoder, an attention gating module and an improved ASPP module, wherein the encoder respectively obtains deep characteristic information and shallow characteristic information by downsampling through a double convolution residual error characteristic extraction module; gradually increasing resolution through up-sampling operation of a decoder and dividing images by utilizing deep layer characteristic information and shallow layer characteristic information; the attention gating module is added before each decoding layer of the network decoder and is used for enhancing the attention of the network to specific characteristics; the improved ASPP module is positioned at the tail end of the encoder and is used for capturing the space information under different void ratios so as to effectively process targets with different scales;
(5) Training network: training the segmentation network by using a parameter initialization method to obtain a trained segmentation network;
(6) Recognition result: and inputting the acquired ultrasonic image data into a trained segmentation network for recognition to obtain the segmentation result of each tissue and target area on the acoustic channel in the ultrasonic image data.
2. The heterogeneous UNet-based ultrasound image multi-tissue segmentation method as set forth in claim 1, wherein: the improved U-Net network is carried out according to the following steps:
a) Extracting shallow layer characteristic information by a double convolution residual error characteristic extraction module at the front part of the encoder; deep feature information is extracted by a double convolution residual feature extraction module at the rear part of the encoder;
b) After shallow layer features and deep layer features are obtained, performing enhanced feature extraction operation on the deep layer features through a space pyramid with different expansion rates in a decoder; carrying out convolution processing on shallow features, carrying out feature fusion on the deep and shallow features subjected to operation processing, carrying out convolution operation on the fused features, and inputting the convolutions into a space pyramid ASPP;
c) Multiplying the input feature map output by the attention gating module by elements of an attention coefficient, capturing context information by using a feature map of the coarse granularity of an image obtained by a convolutional neural network, and fusing shallow fine granularity features with deep coarse granularity features by the attention gating module to obtain the category and the position of a target area in the image;
d) And (3) carrying out upsampling by bilinear interpolation, and predicting the image characteristics after convolution pooling operation to obtain pixel level division of each organization on the acoustic channel.
3. The heterogeneous UNet-based ultrasound image multi-tissue segmentation method as set forth in claim 1, wherein: the double convolution residual error characteristic extraction module is carried out according to the following steps:
the output features and the input features of the first convolution layer of the double convolution residual feature extraction module form first residual connection;
and the output characteristics of the second convolution layer of the double convolution residual characteristic extraction module are added with the output characteristics of the first convolution layer to form second residual connection, and finally, a second residual connection result is output.
4. The heterogeneous UNet-based ultrasound image multi-tissue segmentation method as set forth in claim 1, wherein: the last layer of the encoder is connected with the corresponding layer of the decoder through an improved ASPP module, and the improved ASPP module adopts hole convolution kernels with different expansion rates for convolution and fuses the striping pool SPM module.
5. The heterogeneous UNet-based ultrasound image multi-tissue segmentation method as set forth in claim 4, wherein: the striping pool SPM module is performed according to the following steps:
respectively pooling the input feature map by horizontal stripes and vertical stripes;
after convolution and expansion processing, summing corresponding to the same position to obtain a characteristic diagram of H multiplied by W;
and multiplying the 1 multiplied by 1 convolution and sigmoid processing with the corresponding pixel of the original input graph to obtain an output result.
6. Ultrasonic image multi-tissue segmentation system based on heterogeneous UNet, its characterized in that: the system comprises an input end, an encoder, an attention gating module, an improved ASPP module, a decoder and an output end;
the encoder comprises a plurality of double convolution residual error characteristic extraction modules;
the decoder comprises a plurality of up-sampling convolution layers;
the encoder performs downsampling through a double convolution residual feature extraction module to respectively obtain deep feature information and shallow feature information; gradually increasing resolution through up-sampling operation of a decoder and dividing images by utilizing the deep layer characteristic information and the shallow layer characteristic information;
the input end is used for inputting the B ultrasonic image into the double convolution residual error characteristic extraction module;
the double convolution residual feature extraction module is used for extracting features through residual connection of two convolution layers;
the attention gating module is arranged in jump connection of the corresponding layer and is used for setting the weight of the feature;
the improved ASPP module is arranged between an encoder and a decoder in the U-Net structure and is used for carrying out feature processing on shallow features and deep features;
the output end is used for outputting data processed by a decoder in the U-Net structure.
7. The heterogeneous UNet-based ultrasound image multi-tissue segmentation system according to claim 6, wherein: the double convolution residual feature extraction module comprises a first convolution layer and a second convolution layer; the output features and the input features of the first convolution layer form a first residual connection; the output characteristics of the second convolution layer are added to the output characteristics of the first convolution layer to form a second residual connection, and the final output is the result of the second residual connection.
8. The heterogeneous UNet-based ultrasound image multi-tissue segmentation system according to claim 6, wherein: the improved ASPP module comprises a plurality of parallel convolution operation modules, a strip pooling SPM module, a connection operation module and an output module;
the parallel convolution operation module is used for carrying out convolution operation under different sampling rates, so that context information of images is captured on different scales, the model is facilitated to better understand local details and global structures in the images, and the perceptibility of the model to targets of different scales is improved;
the striping SPM module captures remote context by focusing on two paths along two dimensions of horizontal and vertical space, respectively, such that each location in the output tensor establishes a correspondence with various locations in the input tensor;
the connecting operation module is used for combining the characteristic images with different dimensions and capturing context information under different spatial dimensions in the image;
and the output module is used for outputting and connecting the data obtained by the operation module.
9. The heterogeneous UNet-based ultrasound image multi-tissue segmentation system according to claim 8, wherein: the strip pooling SPM module comprises a vertical strip pooling layer, a horizontal strip pooling layer, a one-dimensional convolution, a fusion module and an activation function;
the vertical bar pooling layer is used for encoding the remote context along the vertical space dimension;
the horizontal bar pooling layer is used for encoding the remote context along the horizontal space dimension;
the one-dimensional convolution is used for dimension expansion and modulating the current position and adjacent characteristics thereof;
the fusion module is used for obtaining output containing more useful global priori;
the activation function is used for mapping the output of the network between 0 and 1.
CN202410092660.0A 2024-01-23 2024-01-23 Ultrasonic image multi-tissue segmentation method and system based on heterogeneous UNet Pending CN117876690A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410092660.0A CN117876690A (en) 2024-01-23 2024-01-23 Ultrasonic image multi-tissue segmentation method and system based on heterogeneous UNet

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410092660.0A CN117876690A (en) 2024-01-23 2024-01-23 Ultrasonic image multi-tissue segmentation method and system based on heterogeneous UNet

Publications (1)

Publication Number Publication Date
CN117876690A true CN117876690A (en) 2024-04-12

Family

ID=90582719

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410092660.0A Pending CN117876690A (en) 2024-01-23 2024-01-23 Ultrasonic image multi-tissue segmentation method and system based on heterogeneous UNet

Country Status (1)

Country Link
CN (1) CN117876690A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118154626A (en) * 2024-05-09 2024-06-07 清泽医疗科技(广东)有限公司 Nerve block anesthesia ultrasonic guidance image processing method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118154626A (en) * 2024-05-09 2024-06-07 清泽医疗科技(广东)有限公司 Nerve block anesthesia ultrasonic guidance image processing method

Similar Documents

Publication Publication Date Title
CN111798462B (en) Automatic delineation method of nasopharyngeal carcinoma radiotherapy target area based on CT image
WO2022041307A1 (en) Method and system for constructing semi-supervised image segmentation framework
CN110930416B (en) MRI image prostate segmentation method based on U-shaped network
CN112949838B (en) Convolutional neural network based on four-branch attention mechanism and image segmentation method
CN107203989A (en) End-to-end chest CT image dividing method based on full convolutional neural networks
CN110648331B (en) Detection method for medical image segmentation, medical image segmentation method and device
CN113298830B (en) Acute intracranial ICH region image segmentation method based on self-supervision
CN117876690A (en) Ultrasonic image multi-tissue segmentation method and system based on heterogeneous UNet
CN111916206B (en) CT image auxiliary diagnosis system based on cascade connection
CN117078692B (en) Medical ultrasonic image segmentation method and system based on self-adaptive feature fusion
CN112819831B (en) Segmentation model generation method and device based on convolution Lstm and multi-model fusion
WO2021027152A1 (en) Image synthesis method based on conditional generative adversarial network, and related device
CN111383759A (en) Automatic pneumonia diagnosis system
CN114398979A (en) Ultrasonic image thyroid nodule classification method based on feature decoupling
CN113436173A (en) Abdomen multi-organ segmentation modeling and segmentation method and system based on edge perception
CN111062953A (en) Method for identifying parathyroid hyperplasia in ultrasonic image
CN113643297B (en) Computer-aided age analysis method based on neural network
CN114581474A (en) Automatic clinical target area delineation method based on cervical cancer CT image
Zhao et al. Attractive deep morphology-aware active contour network for vertebral body contour extraction with extensions to heterogeneous and semi-supervised scenarios
CN113379691B (en) Breast lesion deep learning segmentation method based on prior guidance
CN112967295B (en) Image processing method and system based on residual network and attention mechanism
CN114974522A (en) Medical image processing method and device, electronic equipment and storage medium
CN112330640B (en) Method, device and equipment for segmenting nodule region in medical image
CN117636076B (en) Prostate MRI image classification method based on deep learning image model
CN117576492B (en) Automatic focus marking and identifying device for gastric interstitial tumor under gastric ultrasonic endoscope

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination