CN117671357A - Pyramid algorithm-based prostate cancer ultrasonic video classification method and system - Google Patents

Pyramid algorithm-based prostate cancer ultrasonic video classification method and system Download PDF

Info

Publication number
CN117671357A
CN117671357A CN202311646253.1A CN202311646253A CN117671357A CN 117671357 A CN117671357 A CN 117671357A CN 202311646253 A CN202311646253 A CN 202311646253A CN 117671357 A CN117671357 A CN 117671357A
Authority
CN
China
Prior art keywords
scale
attention
features
time
prostate cancer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311646253.1A
Other languages
Chinese (zh)
Other versions
CN117671357B (en
Inventor
卢旭
梁坤
袁圆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Polytechnic Normal University
Original Assignee
Guangdong Polytechnic Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Polytechnic Normal University filed Critical Guangdong Polytechnic Normal University
Priority to CN202311646253.1A priority Critical patent/CN117671357B/en
Publication of CN117671357A publication Critical patent/CN117671357A/en
Application granted granted Critical
Publication of CN117671357B publication Critical patent/CN117671357B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Ultra Sonic Daignosis Equipment (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a pyramid algorithm-based prostate cancer ultrasonic video classification method and a pyramid algorithm-based prostate cancer ultrasonic video classification system, wherein the pyramid algorithm-based prostate cancer ultrasonic video classification system comprises the following steps: extracting multi-scale features of the prostate cancer ultrasonic video based on a multi-scale pyramid network; modulating a multi-scale feature vector with a 3D channel level attention mechanism based on the multi-scale feature; processing the multi-scale features based on the multi-scale feature vectors to construct space-time features; acquiring a space-time attention tensor based on the space-time features; modulating the multi-scale feature vector based on the space-time attention tensor to obtain a modulated multi-scale input feature tensor; and acquiring final output characteristics based on the modulated multi-scale input characteristic tensor, and completing the ultrasonic video classification of the prostate cancer. The method allows the model to perform feature extraction on the prostate cancer ultrasonic video on different scales, so that the algorithm can more comprehensively capture information with different detail levels, and the video classification performance is improved.

Description

Pyramid algorithm-based prostate cancer ultrasonic video classification method and system
Technical Field
The invention belongs to the field of medical image processing and computer vision, and particularly relates to a pyramid algorithm-based prostate cancer ultrasonic video classification method and system.
Background
In recent years, a deep learning-based prostate cancer classification algorithm has made remarkable progress for static ultrasound image data, but there is still a large room for progress for dynamic ultrasound video data. Ultrasound video is a promising area of research and development because it contains more spatial and temporal information than static images. Since the introduction of I3D, 3DCNN has been dominant in the field of video classification, but 3D convolutional neural networks have difficulty in grasping a balance between computational complexity and classification performance, and in efficiently utilizing timing and spatial relationships in video.
Currently, for better integration of spatio-temporal information, vision Transformers (Vision Transformers) is favored for its excellent spatio-temporal information aggregation capability, but the transducer-related network model is less suitable for a limited number of ultrasound video data of the prostate after a number of experimental verifications. To better handle the changes in visual cadence, related researchers have introduced a pyramid network that can handle frames sampled at different rates, allowing fast cadence and slow cadence information to be captured at different depths. However, they generally fail to explicitly consider the relative importance of each channel feature, which may result in the ignorance of critical channel features. Furthermore, they fail to fully consider temporal and spatial relationships in video data, which limits their effectiveness in capturing dynamic information and spatial structures. In the face of this problem, the use of a triple-focus enhanced pyramid algorithm would be a reliable solution, with triple-focus mechanisms able to extract various key features to improve the accuracy of prostate cancer classification under dynamic ultrasound video.
Disclosure of Invention
In order to solve the technical problems, the invention provides a pyramid algorithm-based prostate cancer ultrasonic video classification method and a pyramid algorithm-based prostate cancer ultrasonic video classification system, which adopt an innovative pyramid classification network frame, wherein the frame consists of a channel, space and time triple attention mechanism, and can learn relevant characteristics such as the shape, texture, time sequence information, spatial structure and the like of lesions from different attention layers. The method effectively fuses the features of different levels by enhancing the output of the multi-scale pyramid features, and ensures the compatibility among the continuous features. Most importantly, the model improves the understanding of different channel characteristics, time and space dimensions, so that the classification of the prostate cancer is more accurate and reliable.
In order to achieve the above purpose, the invention provides a pyramid algorithm-based ultrasonic video classification method for prostate cancer, which comprises the following steps:
extracting multi-scale features of the prostate cancer ultrasonic video based on a multi-scale pyramid network;
modulating a multi-scale feature vector with a 3D channel level attention mechanism based on the multi-scale feature;
processing the multi-scale features based on the multi-scale feature vectors to construct space-time features;
acquiring a space-time attention tensor based on the space-time features;
modulating the multi-scale feature vector based on the space-time attention tensor to obtain a modulated multi-scale input feature tensor;
and acquiring final output characteristics based on the modulated multi-scale input characteristic tensor, and completing the ultrasonic video classification of the prostate cancer.
Optionally, extracting the multi-scale features of the prostate cancer ultrasound video includes:
dividing input prostate cancer ultrasonic video frame data into a plurality of subframes, wherein each subframe represents different time scales;
and extracting the characteristics of each subframe by using the multi-scale pyramid network to obtain the multi-scale characteristics of different time scales.
Optionally, the multi-scale pyramid network adopts an expansion-based 3D MBF-Net structure;
feature extraction for each subframe using the multi-scale pyramid network includes:
the sub-frames are converted into an input feature map, by performing channel segmentation after convolution with 1 x 1 convolution and 3 x 3 convolution, one of the channels is subjected to a 3 x 3 depth convolution, a 3 x 3 depth dilation convolution, a dilation ratio of 2, and a maximum pooling of 2 x 2; the other channel performs a 3 x 3 depth convolution, a 3 x 3 depth dilation convolution, a dilation ratio of 4, and a maximum pooling of 2 x 2; and then carrying out 1 multiplied by 1 convolution on the convolution results of the two channels in series, adding the convolution results with the input feature map, and obtaining an output feature map after shuffling the channels.
Optionally, modulating the multi-scale feature vector based on the 3D channel level attention mechanism comprises:
processing the multi-scale features by using a 3D channel attention mechanism to obtain multi-scale input feature tensors;
performing reduction operation on the multi-scale input feature tensor along the channel dimension by using self-adaptive average pooling to obtain preset features;
calculating the attention weight of the channel according to the preset characteristics through the multi-layer perceptron to obtain the attention weight of the channel;
and carrying out channel weighting on the multi-scale input feature tensor based on the channel attention weight to acquire the multi-scale feature vector.
Optionally, processing the multi-scale feature based on the multi-scale feature vector, constructing a spatio-temporal feature includes:
processing the multi-scale features based on the multi-scale feature vectors to obtain video frame data of a plurality of time steps and spatial information of each time step;
the spatio-temporal features are constructed based on the video frame data for a number of time steps and the spatial information for each time step.
Optionally, based on the spatiotemporal features, acquiring the spatiotemporal attention tensor includes:
calculating attention weights in time sequence and space dimension of the space-time characteristics based on three-dimensional convolution operation, and acquiring time sequence attention and space attention;
multiplying the time-series attention and the space attention to obtain the space-time attention tensor.
Optionally, based on the modulated multi-scale input feature tensor, obtaining the final output feature includes:
linear transformation is used for the modulated multi-scale input characteristic tensor of the current scale to generate a query vector;
using linear transformation to the modulated multi-scale input characteristic tensor of other scales except the current scale to generate a key vector;
calculating a dot product between the query vector and the key vector to obtain an original attention weight;
performing softmax operation on the original attention weight to obtain a final normalized attention weight;
based on the attention weight, the feature other_x of different scales is obtained atti And carrying out weighted summation to obtain the final output characteristics.
In order to achieve the above object, the present invention further provides a pyramid algorithm-based ultrasonic video classification system for prostate cancer, comprising: the system comprises a feature extraction module, a channel attention module, a semantic information dividing module, a space-time attention module, a multi-scale feature interaction fusion module and a classification module;
the feature extraction module is used for constructing a multi-scale pyramid network, dividing input prostate cancer ultrasonic video data into a plurality of subframes, wherein each subframe represents different time scales, and extracting features of each subframe by using the multi-scale pyramid network to obtain the multi-scale features of different time scales;
the channel attention module is used for modulating the multi-scale feature vector by using an attention mechanism of a 3D channel level;
the semantic information dividing module is used for processing the multi-scale features according to the multi-scale feature vectors and constructing space-time features;
the space-time attention module is used for calculating attention weights in time sequence and space dimension in the space-time characteristics through three-dimensional convolution operation, and acquiring space-time attention characteristics of different scales;
the multi-scale feature interaction fusion module is used for carrying out information interaction and fusion on the space-time attention features with different scales to obtain final output features;
and the classification module is used for classifying the input ultrasonic video data of the prostate cancer according to the final output characteristics.
Compared with the prior art, the invention has the following advantages and technical effects:
the invention provides a method based on a multi-scale pyramid network, which allows a model to perform feature extraction on prostate cancer ultrasonic videos on different scales. This enables the algorithm to more fully capture information of different levels of detail, thereby improving video classification performance.
Through a 3D channel attention enhancement mechanism, the model can adaptively pay attention to the characteristic information of different channels, so that the distinguishing property of the characteristics is improved. This helps to reduce redundant information and improve the accuracy of the ultrasound video classification of prostate cancer.
A spatiotemporal dual attention enhancement algorithm is used, allowing the model to capture important features in both the time and spatial domains. This improves the model's attention to key frames in the video sequence, enhancing the video classification performance.
A multi-scale feature interaction fusion module is provided, which allows features of different scales to interact and fuse. The method is favorable for improving understanding and comprehensive utilization of different scale information by the model, and further improves video classification performance.
The method is designed specifically for the ultrasonic video classification of the prostate cancer, and has good applicability and performance. The medical image data can be effectively processed, and a powerful ultrasonic video classification tool for prostate cancer is provided.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application, illustrate and explain the application and are not to be construed as limiting the application. In the drawings:
fig. 1 is a schematic diagram of a method for classifying prostate cancer ultrasound video based on a triple-attention pyramid algorithm according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a structure of a back bone using expansion-based 3D MBF-Net according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a multi-scale feature interaction fusion module according to an embodiment of the present invention;
fig. 4 is a flow diagram of a triple-attention pyramid algorithm-based prostate cancer ultrasound video classification system according to an embodiment of the present invention.
Detailed Description
It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.
As shown in fig. 1, the present embodiment provides a method for classifying prostate cancer ultrasound video based on triple-attentive pyramid algorithm, which analyzes and learns channel, spatial and temporal information in the prostate video from triple-attentive point of view and uses the information to reliably classify benign and malignant prostate cancer. The method comprises the following steps:
s1, constructing a multi-scale pyramid network MFP-Net (Multiscale Feature Pyramid Net), allowing input data to be divided into different sub-components, representing information of different scales or layers, and facilitating simultaneous capture of detail and global information and extraction of multi-scale features;
s2, modulating the multi-scale feature vector by using a 3D channel-level attention mechanism, so that the model can learn information of different channel features better. Correcting the channel attention vector using a Softmax function to ensure that the weights of the individual components are valid;
s3, applying the corrected channel attention vector to the multi-scale feature map as final output, and improving the quality and model performance of the multi-scale features. Processing multi-scale feature data, including video frame data of a plurality of time steps and spatial information of each time step, and constructing space-time features;
s4, calculating the attention weight on the time sequence and the space dimension by using the three-dimensional convolution operation so as to determine which time steps and areas are more important to classification tasks, and ensuring effective modeling of the time sequence and the space information;
s5, multiplying the time sequence and the space attention to generate a final space-time attention tensor, and applying the final space-time attention tensor to the input characteristic tensor to improve the modeling capability of the model on the space-time structure.
S6, acquiring the enhanced multi-scale input features, and ensuring that the enhanced multi-scale input features have the same channel number. And promoting information interaction among different scale features by using a multiscale feature interaction fusion attention mechanism, and generating final output features.
Specifically, in this embodiment, the specific procedure of step S1 is as follows:
s11, dividing input ultrasonic video frame data of the prostate cancer into a plurality of subframes, wherein each subframe represents different time scales. Let the input video frame be I, split to obtain sub-frame set { I } 1 ,I 2 ,...,I n Where n represents the number of subframes.
S12, for each subframe I i Feature extraction was performed using a Multi-scale pyramid network MFP-Net, using an expansion-based 3D MBF-Net (3D Multi-Branch fusion Net) as a backbone, as shown in FIG. 2. Let F 1 ,F 2 ,...,F n Respectively represent the slave sub-frames I 1 ,I 2 ,...,I n Extracted features. This can be expressed by the following formula:
F n =3DMBF_Net(I n )
S13、and carrying out multi-scale feature extraction on the features extracted from each subframe to obtain information of different time scales. Using convolution kernels K of different sizes i For characteristic diagram F n Performing convolution operation to obtain feature representations of different scales, K i Representing the ith convolution kernel. Feature map F for each scale si An averaging pooling operation is applied to reduce the spatial dimension of the feature map. This can be expressed by the following formula:
F si =Avg_Pooling(Conv(F n ,K i ))
as shown in fig. 2, in this embodiment, the backup used in the multi-scale pyramid network is based on an expanded 3D MBF-Net, and the specific procedure is as follows:
first, the shape of the input feature map is (T, H, W, C), where T represents the number of time frames, H represents the height, W represents the width, and C represents the number of channels. Secondly, the first step of the method comprises the steps of, by performing channel segmentation after convolution with 1 x 1 convolution and 3 x 3 convolution, one of the channels is subjected to a 3 x 3 depth convolution, a 3 x 3 depth dilation convolution, a dilation ratio of 2, and a maximum pooling of 2 x 2; the other channel performs a 3 x 3 depth convolution, a 3 x 3 depth dilation convolution, a dilation ratio of 4, and a maximum pooling of 2 x 2. And then carrying out 1 multiplied by 1 convolution on the convolution results of the two channels in series, adding the convolution results with the input feature map, and obtaining an output feature map containing rich semantic features after shuffling the channels.
Specifically, in the present embodiment, the channel characteristics generally refer to channels or characteristic diagrams in a Convolutional Neural Network (CNN). For prostate cancer ultrasound video, each channel may correspond to different information, such as edges, textures, shapes, structures, etc. These channel features can capture information of different levels and semantics.
The specific process of step S2 is as follows:
s21, in order to dynamically adjust the importance of each channel, a module uses a 3D channel attention mechanism to give an input multi-scale input characteristic tensorWherein b represents a batch size and c representsThe number of channels, t, h and w, represent the dimensions of time, height and width, respectively, i representing the number of dimensions.
The backbone adopted by the multi-scale pyramid network is based on 3DMBF-Net of expansion, convolution kernels with different sizes are used, and the three scales are embodied, wherein one scale is an original image scale without using expansion rate, and the other scale is a scale characteristic with expansion rate of 2 and 4 respectively. Here, the 3D channel attention mechanism is used to process the multi-scale features, and three input feature tensors with different scales are obtained. The network is parallel processing of different scale features.
S22, using adaptive average pooling to F si Performing an operation along the channel dimension, reducing it to 1, resulting iny=Avg_Pooling(F si ) The calculation of the channel attention weights is performed by a multi-layer perceptron (MLP). The MLP contains linear transformation and activation function operations, the formula is as follows:
Z=MLP(y)=Sigmoid(Linear(ReLU(Linear(y))))
here, theIs the calculated channel attention weight.
S23, finally, applying the channel attention weight Z to the input characteristic tensor F si And weighting the channels. This can be achieved by element-wise multiplication:
X c_atti =F si ⊙Z
wherein X is c_atti Is the feature tensor of the ith scale after 3D channel attention modulation. Through the process, a multi-scale characteristic representation enhanced by a channel-level attention mechanism is obtained, so that the model can adaptively pay attention to different channels of input characteristics, the characterization capability of the model on the channel characteristics is improved, and the classification accuracy of the prostate cancer ultrasonic video is further improved.
In addition, an auxiliary classification head is added to receive a stronger supervision signal, so that the semantics of the characteristics are enhanced. Therefore, in this embodiment, the loss calculation formula of the model backbone network is as follows:
wherein the method comprises the steps ofIs the original cross entropy loss, < >>Is the loss of the i-th auxiliary classification head. Lambda (lambda) i To balance the coefficients, the features are effectively spatially semantically adjusted to have consistent shape and semantics in the spatial dimension.
In particular, in this embodiment, the timing information relates to the relationship between different frames in the ultrasound video of prostate cancer. It may include short term dynamic changes such as pulsations of prostate tissue, changes in blood flow velocity, and long term dynamic changes such as tumor growth and evolution of tissue architecture. The spatial information relates to the relationship between different locations in the ultrasound image of the prostate. It may include structural features such as shape, texture and tissue structure of prostate cancer.
The specific process of the step S4 is as follows:
s41, time sequence attention mechanism: the attention in the time dimension is calculated by a three-dimensional convolution operation and expressed asa ti =Sigmoid(Conv3d(X c_atti ) The temporal attention layer generates the attention weights of the temporal dimension to determine which time steps are more important for the prostate cancer video classification task. The time sequential attention mechanism ensures that the generated attention weight is between 0 and 1 by the Sigmoid function.
S42, spatial attention mechanism: the attention in the spatial dimension is calculated by a three-dimensional convolution operation, expressed asa si =Sigmoid(Conv3d(x c_atti ) The spatial attention layer generates the spatial dimensional attention weights and determines which regions are more important to the prostate cancer video classification task. Also, the Sigmoid function ensures that the generated attention weight is between 0 and 1.
As a preferable technical scheme, the specific process of step S5 is as follows:
s51, space-time attention fusion: the final spatiotemporal attention is the result of multiplying the temporal and spatial attention. This fusion mechanism helps the model to better understand the timing and spatial relationships of the input prostate cancer video data and ensures that relevant information is captured to act on the video classification. Attention a to time sequence t And spatial attention a s Multiplying to obtain the final spatiotemporal attention tensor a spatio-temporali And applies it to the multi-scale input feature tensor x c_atti In the above, it is shown as follows:
a spatio-temporali =a ti ⊙a si
x atti =x c_atti ⊙a spatio-temporali
wherein x is atti Is the characteristic tensor after the time-space attention modulation. The space-time attention module is applied to the feature map generated after the attention modulation of each 3D channel, so that the model can adaptively pay attention to the features of different parts according to the time sequence and the spatial information of the input prostatic cancer video data, the modeling capability of the model on a time structure is improved, and the classification accuracy of the prostatic cancer ultrasonic video is further improved.
Specifically, in this embodiment, the specific procedure of step S6 is as follows:
s61, inputting characteristic x of current scale i atti Using a linear transformation, a query vector query is generated:
query i =scales[i](x atti )
s62, other input features other_x of other scales j attj Using linear transformation to generate key vector key, and then making corresponding dimension exchange, the formula is as follows:
key j =scales[j](other_x attj )
i and j represent different scales, respectively, while i represents the scale currently being processed and j represents the other scale. X is X att_list Is a list of features of multiple scales, i is the index of the current scale, X atti Is a feature of the current scale, the loop will iterate on each scale, the value of i will be 0, 1, 2, etc. in turn, representing the index of the different scale. X is X att_list That is, features extracted from different scales are constructed as a list Xatt/u list . Each element X atti Features representing the ith scale. List X att_list The length of (a) is the number of the scales. And continuously acquiring the scale characteristic corresponding to the current scale index from the list in the process of loop iteration.
This is because in multi-scale feature fusion, it is desirable that each scale feature be able to interact and fuse with other scale features. Thus, it is necessary to traverse different scales, to traverse the features (X atti ) Features of other dimensions (other_X attj ) And performing interaction. And through the condition judgment i= =j, the current scale is ensured to be skipped in the inner layer circulation, so that the attention calculation and feature fusion of the user and the user are avoided.
X atti And other_X attj The method is characterized in that the characteristic vector is output after being modulated by a pyramid algorithm and a triple attention mechanism, the pyramid is multi-scale, and the two characteristic vectors represent characteristic vectors of different scales.
In this way, the module can effectively process the characteristic relation among different scales and perform information interaction with the characteristics of other scales according to the characteristics of each scale. This helps to improve the effect of feature fusion, enabling features of different scales to better interact.
S63, obtaining an original attention weight by calculating a dot product between the query vector and the key vector:
attn_weight ij =query i ⊙key j
s64, carrying out softmax operation on the dot product result to obtain normalized attention weight:
attn_weight ij =softmax(attn_weight ij )
s65, using attention weights to enable the features other_x with different scales to be used attj Weighted summation is carried out, and scale_attn corresponding to each scale i i Fusion results of features from other scales are included, and fused features are generated:
fused_feature=scale_attn i +(attn_weight ij ⊙other_x attj )
and after the output characteristics are subjected to scale adjustment through Max-Pooling maximum Pooling operation, connecting the characteristic graphs, transmitting the characteristic graphs to a full-connection layer, and finally generating a final prediction result through a Softmax function.
Max-Pooling helps to reduce space dimension and preserve important features, information representation of models can be enriched by connecting features of different scales, a full connection layer is used for learning complex relations among the features, a Softmax function converts output of a network into probability distribution, prediction probability of each category is enabled to be between 0 and 1, and finally input data classification is achieved.
In the embodiment, a multi-scale pyramid feature extraction and triple-concentration enhancement technology is adopted, a prostate video dataset of ultrasonic department of Shenzhen people's hospitals is used for verifying a prostate cancer ultrasonic video classification network based on a triple-concentration pyramid algorithm, a training set of the dataset is used for a training process of the classification network, then a test set is predicted, and Accuracy (ACC), area under ROC curve (AUC) and F1-score are compared; and finally, visualizing the focus area focused by the model by using the thermodynamic diagram so as to finish focus positioning effect verification.
The embodiment is not only suitable for the ultrasonic video of the prostate cancer, but also can play an important role in the fields of computer vision and video analysis, and improves the performance of image and video processing tasks.
Based on the triple-attentive pyramid algorithm-based prostate cancer ultrasonic video classification algorithm in the embodiment, the embodiment also provides a triple-attentive pyramid algorithm-based prostate cancer ultrasonic video classification system, and the triple-attentive pyramid classification algorithm is applied to the triple-attentive pyramid classification algorithm. For ease of illustration, only those portions relevant to this embodiment are shown in the structural schematic diagram of an embodiment of a triple-attention pyramid-based ultrasound video classification system for prostate cancer, and those skilled in the art will appreciate that the illustrated structure does not constitute a limitation of the apparatus, and may include more or fewer components than illustrated, or may combine certain components, or may have a different arrangement of components.
Referring to fig. 4, in another embodiment of the present application, a system 100 for classifying an ultrasonic video of prostate cancer based on a triple-attention pyramid algorithm is provided, which includes a multi-scale pyramid network module 101, a 3D channel attention module 102, a semantic information dividing module 103, a spatiotemporal attention module 104, a multi-scale feature interaction fusion module 105, and a cancer classification prediction module 106;
a multi-scale pyramid network module 101 is constructed that allows the input prostate cancer ultrasound video data to be separated into different sub-components, each representing a different scale or hierarchy of information. This helps to focus on multiple levels of data simultaneously, from microscopic to macroscopic, to capture more detail and global information.
The 3D channel attention module 102 modulates the multi-scale feature vectors using the 3D channel level attention mechanism to ensure that the model is better able to adapt to different scale information. This typically involves weighting the different channel characteristics to focus on the most important information for cancer classification;
the semantic information dividing module 103 is used for processing the multi-scale feature data, including video frame data of a plurality of time steps and space information of each time step, and constructing space-time features;
the spatiotemporal attention module 104, which is shown in fig. 3, calculates the attention weights in the temporal and spatial dimensions by a three-dimensional convolution operation to determine which temporal steps and spatial regions are more critical to the classification of prostate cancer.
The multi-scale feature interaction fusion module 105 is configured to facilitate information interaction and fusion between different scale features, and generate a final output feature by accumulating and fusing the multi-scale features. This helps the model more fully understand the complex features of prostate cancer, thereby improving the classification performance of the model.
The cancer classification prediction module 106 accepts the fused features as input and outputs a probability distribution of the cancer classification.
The foregoing is merely a preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions easily conceivable by those skilled in the art within the technical scope of the present application should be covered in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (8)

1. The pyramid algorithm-based prostate cancer ultrasonic video classification method is characterized by comprising the following steps of:
extracting multi-scale features of the prostate cancer ultrasonic video based on a multi-scale pyramid network;
modulating a multi-scale feature vector with a 3D channel level attention mechanism based on the multi-scale feature;
processing the multi-scale features based on the multi-scale feature vectors to construct space-time features;
acquiring a space-time attention tensor based on the space-time features;
modulating the multi-scale feature vector based on the space-time attention tensor to obtain a modulated multi-scale input feature tensor;
and acquiring final output characteristics based on the modulated multi-scale input characteristic tensor, and completing the ultrasonic video classification of the prostate cancer.
2. The pyramid algorithm-based ultrasound video classification method for prostate cancer of claim 1, wherein extracting multi-scale features of the ultrasound video for prostate cancer comprises:
dividing input prostate cancer ultrasonic video frame data into a plurality of subframes, wherein each subframe represents different time scales;
and extracting the characteristics of each subframe by using the multi-scale pyramid network to obtain the multi-scale characteristics of different time scales.
3. The pyramid algorithm-based prostate cancer ultrasound video classification method according to claim 2, wherein the multi-scale pyramid network adopts an expansion-based 3DMBF-Net structure;
feature extraction for each subframe using the multi-scale pyramid network includes:
the sub-frames are converted into an input feature map, by performing channel segmentation after convolution with 1 x 1 convolution and 3 x 3 convolution, one of the channels is subjected to a 3 x 3 depth convolution, a 3 x 3 depth dilation convolution, a dilation ratio of 2, and a maximum pooling of 2 x 2; the other channel performs a 3 x 3 depth convolution, a 3 x 3 depth dilation convolution, a dilation ratio of 4, and a maximum pooling of 2 x 2; and then carrying out 1 multiplied by 1 convolution on the convolution results of the two channels in series, adding the convolution results with the input feature map, and obtaining an output feature map after shuffling the channels.
4. The pyramid algorithm-based ultrasound video classification method for prostate cancer of claim 1, wherein modulating the multi-scale feature vector based on the 3D channel level attention mechanism comprises:
processing the multi-scale features by using a 3D channel attention mechanism to obtain multi-scale input feature tensors;
performing reduction operation on the multi-scale input feature tensor along the channel dimension by using self-adaptive average pooling to obtain preset features;
calculating the attention weight of the channel according to the preset characteristics through the multi-layer perceptron to obtain the attention weight of the channel;
and carrying out channel weighting on the multi-scale input feature tensor based on the channel attention weight to acquire the multi-scale feature vector.
5. The pyramid algorithm-based ultrasound video classification method of claim 1, wherein processing the multi-scale features based on the multi-scale feature vector, constructing spatio-temporal features comprises:
processing the multi-scale features based on the multi-scale feature vectors to obtain video frame data of a plurality of time steps and spatial information of each time step;
the spatio-temporal features are constructed based on the video frame data for a number of time steps and the spatial information for each time step.
6. The pyramid algorithm-based ultrasound video classification method for prostate cancer of claim 1, wherein obtaining a spatiotemporal attention tensor based on the spatiotemporal features comprises:
calculating attention weights in time sequence and space dimension of the space-time characteristics based on three-dimensional convolution operation, and acquiring time sequence attention and space attention;
multiplying the time-series attention and the space attention to obtain the space-time attention tensor.
7. The pyramid algorithm-based ultrasound video classification method of claim 1, wherein obtaining final output features based on the modulated multi-scale input feature tensor comprises:
linear transformation is used for the modulated multi-scale input characteristic tensor of the current scale to generate a query vector;
using linear transformation to the modulated multi-scale input characteristic tensor of other scales except the current scale to generate a key vector;
calculating a dot product between the query vector and the key vector to obtain an original attention weight;
performing softmax operation on the original attention weight to obtain a final normalized attention weight;
based on the attention weight, the feature other_x of different scales is obtained atti And carrying out weighted summation to obtain the final output characteristics.
8. A pyramid algorithm-based ultrasound video classification system for prostate cancer, for implementing the pyramid algorithm-based ultrasound video classification method of any one of claims 1-7, the system comprising: the system comprises a feature extraction module, a channel attention module, a semantic information dividing module, a space-time attention module, a multi-scale feature interaction fusion module and a classification module;
the feature extraction module is used for constructing a multi-scale pyramid network, dividing input prostate cancer ultrasonic video data into a plurality of subframes, wherein each subframe represents different time scales, and extracting features of each subframe by using the multi-scale pyramid network to obtain the multi-scale features of different time scales;
the channel attention module is used for modulating the multi-scale feature vector by using an attention mechanism of a 3D channel level;
the semantic information dividing module is used for processing the multi-scale features according to the multi-scale feature vectors and constructing space-time features;
the space-time attention module is used for calculating attention weights in time sequence and space dimension in the space-time characteristics through three-dimensional convolution operation, and acquiring space-time attention characteristics of different scales;
the multi-scale feature interaction fusion module is used for carrying out information interaction and fusion on the space-time attention features with different scales to obtain final output features;
and the classification module is used for classifying the input ultrasonic video data of the prostate cancer according to the final output characteristics.
CN202311646253.1A 2023-12-01 2023-12-01 Pyramid algorithm-based prostate cancer ultrasonic video classification method and system Active CN117671357B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311646253.1A CN117671357B (en) 2023-12-01 2023-12-01 Pyramid algorithm-based prostate cancer ultrasonic video classification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311646253.1A CN117671357B (en) 2023-12-01 2023-12-01 Pyramid algorithm-based prostate cancer ultrasonic video classification method and system

Publications (2)

Publication Number Publication Date
CN117671357A true CN117671357A (en) 2024-03-08
CN117671357B CN117671357B (en) 2024-07-05

Family

ID=90078357

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311646253.1A Active CN117671357B (en) 2023-12-01 2023-12-01 Pyramid algorithm-based prostate cancer ultrasonic video classification method and system

Country Status (1)

Country Link
CN (1) CN117671357B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20160124948A (en) * 2015-04-20 2016-10-31 전남대학교산학협력단 Tensor Divergence Feature Extraction System based on HoG and HOF for video obejct action classification
KR20190119261A (en) * 2018-04-12 2019-10-22 가천대학교 산학협력단 Apparatus and method for segmenting of semantic image using fully convolutional neural network based on multi scale image and multi scale dilated convolution
US20210287342A1 (en) * 2020-03-10 2021-09-16 Samsung Electronics Co., Ltd. Systems and methods for image denoising using deep convolutional networks
CN114913436A (en) * 2022-06-15 2022-08-16 中科弘云科技(北京)有限公司 Ground object classification method and device based on multi-scale attention mechanism, electronic equipment and medium
CN115131710A (en) * 2022-07-05 2022-09-30 福州大学 Real-time action detection method based on multi-scale feature fusion attention
CN115375716A (en) * 2022-07-22 2022-11-22 桂林电子科技大学 New coronary lesion segmentation method based on multi-scale feature fusion
CN115620118A (en) * 2022-09-15 2023-01-17 河北汉光重工有限责任公司 Saliency target detection method based on multi-scale expansion convolutional neural network
CN116386034A (en) * 2023-02-16 2023-07-04 武汉大学 Cervical cell classification method based on multiscale attention feature enhancement
CN116385382A (en) * 2023-03-23 2023-07-04 济南大学 Network model for visceral tumor segmentation in ultrasonic image
WO2023185243A1 (en) * 2022-03-29 2023-10-05 河南工业大学 Expression recognition method based on attention-modulated contextual spatial information
US20230343128A1 (en) * 2022-04-24 2023-10-26 Nanjing Agricultural University Juvenile fish limb identification method based on multi-scale cascaded perceptual convolutional neural network

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20160124948A (en) * 2015-04-20 2016-10-31 전남대학교산학협력단 Tensor Divergence Feature Extraction System based on HoG and HOF for video obejct action classification
KR20190119261A (en) * 2018-04-12 2019-10-22 가천대학교 산학협력단 Apparatus and method for segmenting of semantic image using fully convolutional neural network based on multi scale image and multi scale dilated convolution
US20210287342A1 (en) * 2020-03-10 2021-09-16 Samsung Electronics Co., Ltd. Systems and methods for image denoising using deep convolutional networks
WO2023185243A1 (en) * 2022-03-29 2023-10-05 河南工业大学 Expression recognition method based on attention-modulated contextual spatial information
US20230343128A1 (en) * 2022-04-24 2023-10-26 Nanjing Agricultural University Juvenile fish limb identification method based on multi-scale cascaded perceptual convolutional neural network
CN114913436A (en) * 2022-06-15 2022-08-16 中科弘云科技(北京)有限公司 Ground object classification method and device based on multi-scale attention mechanism, electronic equipment and medium
CN115131710A (en) * 2022-07-05 2022-09-30 福州大学 Real-time action detection method based on multi-scale feature fusion attention
CN115375716A (en) * 2022-07-22 2022-11-22 桂林电子科技大学 New coronary lesion segmentation method based on multi-scale feature fusion
CN115620118A (en) * 2022-09-15 2023-01-17 河北汉光重工有限责任公司 Saliency target detection method based on multi-scale expansion convolutional neural network
CN116386034A (en) * 2023-02-16 2023-07-04 武汉大学 Cervical cell classification method based on multiscale attention feature enhancement
CN116385382A (en) * 2023-03-23 2023-07-04 济南大学 Network model for visceral tumor segmentation in ultrasonic image

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WEIXING WANG等: "Pyramid-dilated deep convolutional neural network for crowd counting", 《APPLIED INTELLIGENCE》, vol. 52, 29 March 2021 (2021-03-29), pages 1825 *
王辉涛;胡燕;: "基于全局时空感受野的高效视频分类方法", 小型微型计算机***, no. 08, 31 August 2020 (2020-08-31), pages 202 - 209 *

Also Published As

Publication number Publication date
CN117671357B (en) 2024-07-05

Similar Documents

Publication Publication Date Title
CN112052886B (en) Intelligent human body action posture estimation method and device based on convolutional neural network
Liao et al. Deep facial spatiotemporal network for engagement prediction in online learning
Afza et al. A framework of human action recognition using length control features fusion and weighted entropy-variances based feature selection
WO2020133636A1 (en) Method and system for intelligent envelope detection and warning in prostate surgery
CN111667399B (en) Training method of style migration model, video style migration method and device
Ge et al. An attention mechanism based convolutional LSTM network for video action recognition
CN113673307A (en) Light-weight video motion recognition method
Leclerc et al. LU-Net: a multistage attention network to improve the robustness of segmentation of left ventricular structures in 2-D echocardiography
CN112465827A (en) Contour perception multi-organ segmentation network construction method based on class-by-class convolution operation
CN112598597A (en) Training method of noise reduction model and related device
Balaji et al. Medical image analysis with deep neural networks
Li et al. Non-contact PPG signal and heart rate estimation with multi-hierarchical convolutional network
Ming et al. 3D-TDC: A 3D temporal dilation convolution framework for video action recognition
KR20210114257A (en) Action Recognition Method and Apparatus in Untrimmed Videos Based on Artificial Neural Network
CN112507920A (en) Examination abnormal behavior identification method based on time displacement and attention mechanism
CN114842238A (en) Embedded mammary gland ultrasonic image identification method
Zhang et al. Unsupervised depth estimation from monocular videos with hybrid geometric-refined loss and contextual attention
CN116563355A (en) Target tracking method based on space-time interaction attention mechanism
CN113435234B (en) Driver visual saliency area prediction method based on bimodal video EEG data
CN116129193B (en) Method, system and equipment for predicting organoid growth
CN117671357B (en) Pyramid algorithm-based prostate cancer ultrasonic video classification method and system
CN113313133A (en) Training method for generating countermeasure network and animation image generation method
CN116129124A (en) Image segmentation method, system and equipment
CN115861490A (en) Image animation construction method and system based on attention mechanism
Zhao et al. Research on human behavior recognition in video based on 3DCCA

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant