CN117671357A

CN117671357A - Pyramid algorithm-based prostate cancer ultrasonic video classification method and system

Info

Publication number: CN117671357A
Application number: CN202311646253.1A
Authority: CN
Inventors: 卢旭; 梁坤; 袁圆
Original assignee: Guangdong Polytechnic Normal University
Current assignee: Guangdong Polytechnic Normal University
Priority date: 2023-12-01
Filing date: 2023-12-01
Publication date: 2024-03-08
Anticipated expiration: 2043-12-01
Also published as: CN117671357B

Abstract

The invention discloses a pyramid algorithm-based prostate cancer ultrasonic video classification method and a pyramid algorithm-based prostate cancer ultrasonic video classification system, wherein the pyramid algorithm-based prostate cancer ultrasonic video classification system comprises the following steps: extracting multi-scale features of the prostate cancer ultrasonic video based on a multi-scale pyramid network; modulating a multi-scale feature vector with a 3D channel level attention mechanism based on the multi-scale feature; processing the multi-scale features based on the multi-scale feature vectors to construct space-time features; acquiring a space-time attention tensor based on the space-time features; modulating the multi-scale feature vector based on the space-time attention tensor to obtain a modulated multi-scale input feature tensor; and acquiring final output characteristics based on the modulated multi-scale input characteristic tensor, and completing the ultrasonic video classification of the prostate cancer. The method allows the model to perform feature extraction on the prostate cancer ultrasonic video on different scales, so that the algorithm can more comprehensively capture information with different detail levels, and the video classification performance is improved.

Description

Pyramid algorithm-based prostate cancer ultrasonic video classification method and system

Technical Field

The invention belongs to the field of medical image processing and computer vision, and particularly relates to a pyramid algorithm-based prostate cancer ultrasonic video classification method and system.

Background

In recent years, a deep learning-based prostate cancer classification algorithm has made remarkable progress for static ultrasound image data, but there is still a large room for progress for dynamic ultrasound video data. Ultrasound video is a promising area of research and development because it contains more spatial and temporal information than static images. Since the introduction of I3D, 3DCNN has been dominant in the field of video classification, but 3D convolutional neural networks have difficulty in grasping a balance between computational complexity and classification performance, and in efficiently utilizing timing and spatial relationships in video.

Currently, for better integration of spatio-temporal information, vision Transformers (Vision Transformers) is favored for its excellent spatio-temporal information aggregation capability, but the transducer-related network model is less suitable for a limited number of ultrasound video data of the prostate after a number of experimental verifications. To better handle the changes in visual cadence, related researchers have introduced a pyramid network that can handle frames sampled at different rates, allowing fast cadence and slow cadence information to be captured at different depths. However, they generally fail to explicitly consider the relative importance of each channel feature, which may result in the ignorance of critical channel features. Furthermore, they fail to fully consider temporal and spatial relationships in video data, which limits their effectiveness in capturing dynamic information and spatial structures. In the face of this problem, the use of a triple-focus enhanced pyramid algorithm would be a reliable solution, with triple-focus mechanisms able to extract various key features to improve the accuracy of prostate cancer classification under dynamic ultrasound video.

Disclosure of Invention

In order to solve the technical problems, the invention provides a pyramid algorithm-based prostate cancer ultrasonic video classification method and a pyramid algorithm-based prostate cancer ultrasonic video classification system, which adopt an innovative pyramid classification network frame, wherein the frame consists of a channel, space and time triple attention mechanism, and can learn relevant characteristics such as the shape, texture, time sequence information, spatial structure and the like of lesions from different attention layers. The method effectively fuses the features of different levels by enhancing the output of the multi-scale pyramid features, and ensures the compatibility among the continuous features. Most importantly, the model improves the understanding of different channel characteristics, time and space dimensions, so that the classification of the prostate cancer is more accurate and reliable.

In order to achieve the above purpose, the invention provides a pyramid algorithm-based ultrasonic video classification method for prostate cancer, which comprises the following steps:

extracting multi-scale features of the prostate cancer ultrasonic video based on a multi-scale pyramid network;

modulating a multi-scale feature vector with a 3D channel level attention mechanism based on the multi-scale feature;

processing the multi-scale features based on the multi-scale feature vectors to construct space-time features;

acquiring a space-time attention tensor based on the space-time features;

modulating the multi-scale feature vector based on the space-time attention tensor to obtain a modulated multi-scale input feature tensor;

and acquiring final output characteristics based on the modulated multi-scale input characteristic tensor, and completing the ultrasonic video classification of the prostate cancer.

Optionally, extracting the multi-scale features of the prostate cancer ultrasound video includes:

dividing input prostate cancer ultrasonic video frame data into a plurality of subframes, wherein each subframe represents different time scales;

and extracting the characteristics of each subframe by using the multi-scale pyramid network to obtain the multi-scale characteristics of different time scales.

Optionally, the multi-scale pyramid network adopts an expansion-based 3D MBF-Net structure;

feature extraction for each subframe using the multi-scale pyramid network includes:

the sub-frames are converted into an input feature map, by performing channel segmentation after convolution with 1 x 1 convolution and 3 x 3 convolution, one of the channels is subjected to a 3 x 3 depth convolution, a 3 x 3 depth dilation convolution, a dilation ratio of 2, and a maximum pooling of 2 x 2; the other channel performs a 3 x 3 depth convolution, a 3 x 3 depth dilation convolution, a dilation ratio of 4, and a maximum pooling of 2 x 2; and then carrying out 1 multiplied by 1 convolution on the convolution results of the two channels in series, adding the convolution results with the input feature map, and obtaining an output feature map after shuffling the channels.

Optionally, modulating the multi-scale feature vector based on the 3D channel level attention mechanism comprises:

processing the multi-scale features by using a 3D channel attention mechanism to obtain multi-scale input feature tensors;

performing reduction operation on the multi-scale input feature tensor along the channel dimension by using self-adaptive average pooling to obtain preset features;

calculating the attention weight of the channel according to the preset characteristics through the multi-layer perceptron to obtain the attention weight of the channel;

and carrying out channel weighting on the multi-scale input feature tensor based on the channel attention weight to acquire the multi-scale feature vector.

Optionally, processing the multi-scale feature based on the multi-scale feature vector, constructing a spatio-temporal feature includes:

processing the multi-scale features based on the multi-scale feature vectors to obtain video frame data of a plurality of time steps and spatial information of each time step;

the spatio-temporal features are constructed based on the video frame data for a number of time steps and the spatial information for each time step.

Optionally, based on the spatiotemporal features, acquiring the spatiotemporal attention tensor includes:

calculating attention weights in time sequence and space dimension of the space-time characteristics based on three-dimensional convolution operation, and acquiring time sequence attention and space attention;

multiplying the time-series attention and the space attention to obtain the space-time attention tensor.

Optionally, based on the modulated multi-scale input feature tensor, obtaining the final output feature includes:

linear transformation is used for the modulated multi-scale input characteristic tensor of the current scale to generate a query vector;

using linear transformation to the modulated multi-scale input characteristic tensor of other scales except the current scale to generate a key vector;

calculating a dot product between the query vector and the key vector to obtain an original attention weight;

performing softmax operation on the original attention weight to obtain a final normalized attention weight;

based on the attention weight, the feature other_x of different scales is obtained _atti And carrying out weighted summation to obtain the final output characteristics.

In order to achieve the above object, the present invention further provides a pyramid algorithm-based ultrasonic video classification system for prostate cancer, comprising: the system comprises a feature extraction module, a channel attention module, a semantic information dividing module, a space-time attention module, a multi-scale feature interaction fusion module and a classification module;

the feature extraction module is used for constructing a multi-scale pyramid network, dividing input prostate cancer ultrasonic video data into a plurality of subframes, wherein each subframe represents different time scales, and extracting features of each subframe by using the multi-scale pyramid network to obtain the multi-scale features of different time scales;

the channel attention module is used for modulating the multi-scale feature vector by using an attention mechanism of a 3D channel level;

the semantic information dividing module is used for processing the multi-scale features according to the multi-scale feature vectors and constructing space-time features;

the space-time attention module is used for calculating attention weights in time sequence and space dimension in the space-time characteristics through three-dimensional convolution operation, and acquiring space-time attention characteristics of different scales;

the multi-scale feature interaction fusion module is used for carrying out information interaction and fusion on the space-time attention features with different scales to obtain final output features;

and the classification module is used for classifying the input ultrasonic video data of the prostate cancer according to the final output characteristics.

Compared with the prior art, the invention has the following advantages and technical effects:

the invention provides a method based on a multi-scale pyramid network, which allows a model to perform feature extraction on prostate cancer ultrasonic videos on different scales. This enables the algorithm to more fully capture information of different levels of detail, thereby improving video classification performance.

Through a 3D channel attention enhancement mechanism, the model can adaptively pay attention to the characteristic information of different channels, so that the distinguishing property of the characteristics is improved. This helps to reduce redundant information and improve the accuracy of the ultrasound video classification of prostate cancer.

A spatiotemporal dual attention enhancement algorithm is used, allowing the model to capture important features in both the time and spatial domains. This improves the model's attention to key frames in the video sequence, enhancing the video classification performance.

A multi-scale feature interaction fusion module is provided, which allows features of different scales to interact and fuse. The method is favorable for improving understanding and comprehensive utilization of different scale information by the model, and further improves video classification performance.

The method is designed specifically for the ultrasonic video classification of the prostate cancer, and has good applicability and performance. The medical image data can be effectively processed, and a powerful ultrasonic video classification tool for prostate cancer is provided.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application, illustrate and explain the application and are not to be construed as limiting the application. In the drawings:

fig. 1 is a schematic diagram of a method for classifying prostate cancer ultrasound video based on a triple-attention pyramid algorithm according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a structure of a back bone using expansion-based 3D MBF-Net according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a multi-scale feature interaction fusion module according to an embodiment of the present invention;

fig. 4 is a flow diagram of a triple-attention pyramid algorithm-based prostate cancer ultrasound video classification system according to an embodiment of the present invention.

Detailed Description

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.

As shown in fig. 1, the present embodiment provides a method for classifying prostate cancer ultrasound video based on triple-attentive pyramid algorithm, which analyzes and learns channel, spatial and temporal information in the prostate video from triple-attentive point of view and uses the information to reliably classify benign and malignant prostate cancer. The method comprises the following steps:

s1, constructing a multi-scale pyramid network MFP-Net (Multiscale Feature Pyramid Net), allowing input data to be divided into different sub-components, representing information of different scales or layers, and facilitating simultaneous capture of detail and global information and extraction of multi-scale features;

s2, modulating the multi-scale feature vector by using a 3D channel-level attention mechanism, so that the model can learn information of different channel features better. Correcting the channel attention vector using a Softmax function to ensure that the weights of the individual components are valid;

s3, applying the corrected channel attention vector to the multi-scale feature map as final output, and improving the quality and model performance of the multi-scale features. Processing multi-scale feature data, including video frame data of a plurality of time steps and spatial information of each time step, and constructing space-time features;

s4, calculating the attention weight on the time sequence and the space dimension by using the three-dimensional convolution operation so as to determine which time steps and areas are more important to classification tasks, and ensuring effective modeling of the time sequence and the space information;

s5, multiplying the time sequence and the space attention to generate a final space-time attention tensor, and applying the final space-time attention tensor to the input characteristic tensor to improve the modeling capability of the model on the space-time structure.

S6, acquiring the enhanced multi-scale input features, and ensuring that the enhanced multi-scale input features have the same channel number. And promoting information interaction among different scale features by using a multiscale feature interaction fusion attention mechanism, and generating final output features.

Specifically, in this embodiment, the specific procedure of step S1 is as follows:

s11, dividing input ultrasonic video frame data of the prostate cancer into a plurality of subframes, wherein each subframe represents different time scales. Let the input video frame be I, split to obtain sub-frame set { I } ₁ ,I ₂ ,...,I _n Where n represents the number of subframes.

S12, for each subframe I _i Feature extraction was performed using a Multi-scale pyramid network MFP-Net, using an expansion-based 3D MBF-Net (3D Multi-Branch fusion Net) as a backbone, as shown in FIG. 2. Let F ₁ ,F ₂ ,...,F _n Respectively represent the slave sub-frames I ₁ ,I ₂ ,...，I _n Extracted features. This can be expressed by the following formula:

F _n ＝3DMBF_Net(I _n )

S13、and carrying out multi-scale feature extraction on the features extracted from each subframe to obtain information of different time scales. Using convolution kernels K of different sizes _i For characteristic diagram F _n Performing convolution operation to obtain feature representations of different scales, K _i Representing the ith convolution kernel. Feature map F for each scale _si An averaging pooling operation is applied to reduce the spatial dimension of the feature map. This can be expressed by the following formula:

F _si ＝Avg_Pooling(Conv(F _n ,K _i ))

as shown in fig. 2, in this embodiment, the backup used in the multi-scale pyramid network is based on an expanded 3D MBF-Net, and the specific procedure is as follows:

first, the shape of the input feature map is (T, H, W, C), where T represents the number of time frames, H represents the height, W represents the width, and C represents the number of channels. Secondly, the first step of the method comprises the steps of, by performing channel segmentation after convolution with 1 x 1 convolution and 3 x 3 convolution, one of the channels is subjected to a 3 x 3 depth convolution, a 3 x 3 depth dilation convolution, a dilation ratio of 2, and a maximum pooling of 2 x 2; the other channel performs a 3 x 3 depth convolution, a 3 x 3 depth dilation convolution, a dilation ratio of 4, and a maximum pooling of 2 x 2. And then carrying out 1 multiplied by 1 convolution on the convolution results of the two channels in series, adding the convolution results with the input feature map, and obtaining an output feature map containing rich semantic features after shuffling the channels.

Specifically, in the present embodiment, the channel characteristics generally refer to channels or characteristic diagrams in a Convolutional Neural Network (CNN). For prostate cancer ultrasound video, each channel may correspond to different information, such as edges, textures, shapes, structures, etc. These channel features can capture information of different levels and semantics.

The specific process of step S2 is as follows:

s21, in order to dynamically adjust the importance of each channel, a module uses a 3D channel attention mechanism to give an input multi-scale input characteristic tensorWherein b represents a batch size and c representsThe number of channels, t, h and w, represent the dimensions of time, height and width, respectively, i representing the number of dimensions.

The backbone adopted by the multi-scale pyramid network is based on 3DMBF-Net of expansion, convolution kernels with different sizes are used, and the three scales are embodied, wherein one scale is an original image scale without using expansion rate, and the other scale is a scale characteristic with expansion rate of 2 and 4 respectively. Here, the 3D channel attention mechanism is used to process the multi-scale features, and three input feature tensors with different scales are obtained. The network is parallel processing of different scale features.

S22, using adaptive average pooling to F _si Performing an operation along the channel dimension, reducing it to 1, resulting iny＝Avg_Pooling(F _si ) The calculation of the channel attention weights is performed by a multi-layer perceptron (MLP). The MLP contains linear transformation and activation function operations, the formula is as follows:

Z＝MLP(y)＝Sigmoid(Linear(ReLU(Linear(y))))

here, theIs the calculated channel attention weight.

S23, finally, applying the channel attention weight Z to the input characteristic tensor F _si And weighting the channels. This can be achieved by element-wise multiplication:

X _{c_atti} ＝F _si ⊙Z

wherein X is _{c_atti} Is the feature tensor of the ith scale after 3D channel attention modulation. Through the process, a multi-scale characteristic representation enhanced by a channel-level attention mechanism is obtained, so that the model can adaptively pay attention to different channels of input characteristics, the characterization capability of the model on the channel characteristics is improved, and the classification accuracy of the prostate cancer ultrasonic video is further improved.

In addition, an auxiliary classification head is added to receive a stronger supervision signal, so that the semantics of the characteristics are enhanced. Therefore, in this embodiment, the loss calculation formula of the model backbone network is as follows:

wherein the method comprises the steps ofIs the original cross entropy loss, < >>Is the loss of the i-th auxiliary classification head. Lambda (lambda) _i To balance the coefficients, the features are effectively spatially semantically adjusted to have consistent shape and semantics in the spatial dimension.

In particular, in this embodiment, the timing information relates to the relationship between different frames in the ultrasound video of prostate cancer. It may include short term dynamic changes such as pulsations of prostate tissue, changes in blood flow velocity, and long term dynamic changes such as tumor growth and evolution of tissue architecture. The spatial information relates to the relationship between different locations in the ultrasound image of the prostate. It may include structural features such as shape, texture and tissue structure of prostate cancer.

The specific process of the step S4 is as follows:

s41, time sequence attention mechanism: the attention in the time dimension is calculated by a three-dimensional convolution operation and expressed asa _ti ＝Sigmoid(Conv3d(X _{c_atti} ) The temporal attention layer generates the attention weights of the temporal dimension to determine which time steps are more important for the prostate cancer video classification task. The time sequential attention mechanism ensures that the generated attention weight is between 0 and 1 by the Sigmoid function.

S42, spatial attention mechanism: the attention in the spatial dimension is calculated by a three-dimensional convolution operation, expressed asa _si ＝Sigmoid(Conv3d(x _{c_atti} ) The spatial attention layer generates the spatial dimensional attention weights and determines which regions are more important to the prostate cancer video classification task. Also, the Sigmoid function ensures that the generated attention weight is between 0 and 1.

As a preferable technical scheme, the specific process of step S5 is as follows:

s51, space-time attention fusion: the final spatiotemporal attention is the result of multiplying the temporal and spatial attention. This fusion mechanism helps the model to better understand the timing and spatial relationships of the input prostate cancer video data and ensures that relevant information is captured to act on the video classification. Attention a to time sequence _t And spatial attention a _s Multiplying to obtain the final spatiotemporal attention tensor a _{spatio-temporali} And applies it to the multi-scale input feature tensor x _{c_atti} In the above, it is shown as follows:

a _{spatio-temporali} ＝a _ti ⊙a _si

x _atti ＝x _{c_atti} ⊙a _{spatio-temporali}

wherein x is _atti Is the characteristic tensor after the time-space attention modulation. The space-time attention module is applied to the feature map generated after the attention modulation of each 3D channel, so that the model can adaptively pay attention to the features of different parts according to the time sequence and the spatial information of the input prostatic cancer video data, the modeling capability of the model on a time structure is improved, and the classification accuracy of the prostatic cancer ultrasonic video is further improved.

Specifically, in this embodiment, the specific procedure of step S6 is as follows:

s61, inputting characteristic x of current scale i _atti Using a linear transformation, a query vector query is generated:

query _i ＝scales[i](x _atti )

s62, other input features other_x of other scales j _attj Using linear transformation to generate key vector key, and then making corresponding dimension exchange, the formula is as follows:

key _j ＝scales[j](other_x _attj )

i and j represent different scales, respectively, while i represents the scale currently being processed and j represents the other scale. X is X _{att_list} Is a list of features of multiple scales, i is the index of the current scale, X _atti Is a feature of the current scale, the loop will iterate on each scale, the value of i will be 0, 1, 2, etc. in turn, representing the index of the different scale. X is X _{att_list} That is, features extracted from different scales are constructed as a list Xatt/u _list . Each element X _atti Features representing the ith scale. List X _{att_list} The length of (a) is the number of the scales. And continuously acquiring the scale characteristic corresponding to the current scale index from the list in the process of loop iteration.

This is because in multi-scale feature fusion, it is desirable that each scale feature be able to interact and fuse with other scale features. Thus, it is necessary to traverse different scales, to traverse the features (X _atti ) Features of other dimensions (other_X _attj ) And performing interaction. And through the condition judgment i= =j, the current scale is ensured to be skipped in the inner layer circulation, so that the attention calculation and feature fusion of the user and the user are avoided.

X _atti And other_X _attj The method is characterized in that the characteristic vector is output after being modulated by a pyramid algorithm and a triple attention mechanism, the pyramid is multi-scale, and the two characteristic vectors represent characteristic vectors of different scales.

In this way, the module can effectively process the characteristic relation among different scales and perform information interaction with the characteristics of other scales according to the characteristics of each scale. This helps to improve the effect of feature fusion, enabling features of different scales to better interact.

S63, obtaining an original attention weight by calculating a dot product between the query vector and the key vector:

attn_weight _ij ＝query _i ⊙key _j

s64, carrying out softmax operation on the dot product result to obtain normalized attention weight:

attn_weight _ij ＝softmax(attn_weight _ij )

s65, using attention weights to enable the features other_x with different scales to be used _attj Weighted summation is carried out, and scale_attn corresponding to each scale i _i Fusion results of features from other scales are included, and fused features are generated:

fused_feature＝scale_attn _i +(attn_weight _ij ⊙other_x _attj )

and after the output characteristics are subjected to scale adjustment through Max-Pooling maximum Pooling operation, connecting the characteristic graphs, transmitting the characteristic graphs to a full-connection layer, and finally generating a final prediction result through a Softmax function.

Max-Pooling helps to reduce space dimension and preserve important features, information representation of models can be enriched by connecting features of different scales, a full connection layer is used for learning complex relations among the features, a Softmax function converts output of a network into probability distribution, prediction probability of each category is enabled to be between 0 and 1, and finally input data classification is achieved.

In the embodiment, a multi-scale pyramid feature extraction and triple-concentration enhancement technology is adopted, a prostate video dataset of ultrasonic department of Shenzhen people's hospitals is used for verifying a prostate cancer ultrasonic video classification network based on a triple-concentration pyramid algorithm, a training set of the dataset is used for a training process of the classification network, then a test set is predicted, and Accuracy (ACC), area under ROC curve (AUC) and F1-score are compared; and finally, visualizing the focus area focused by the model by using the thermodynamic diagram so as to finish focus positioning effect verification.

The embodiment is not only suitable for the ultrasonic video of the prostate cancer, but also can play an important role in the fields of computer vision and video analysis, and improves the performance of image and video processing tasks.

Based on the triple-attentive pyramid algorithm-based prostate cancer ultrasonic video classification algorithm in the embodiment, the embodiment also provides a triple-attentive pyramid algorithm-based prostate cancer ultrasonic video classification system, and the triple-attentive pyramid classification algorithm is applied to the triple-attentive pyramid classification algorithm. For ease of illustration, only those portions relevant to this embodiment are shown in the structural schematic diagram of an embodiment of a triple-attention pyramid-based ultrasound video classification system for prostate cancer, and those skilled in the art will appreciate that the illustrated structure does not constitute a limitation of the apparatus, and may include more or fewer components than illustrated, or may combine certain components, or may have a different arrangement of components.

Referring to fig. 4, in another embodiment of the present application, a system 100 for classifying an ultrasonic video of prostate cancer based on a triple-attention pyramid algorithm is provided, which includes a multi-scale pyramid network module 101, a 3D channel attention module 102, a semantic information dividing module 103, a spatiotemporal attention module 104, a multi-scale feature interaction fusion module 105, and a cancer classification prediction module 106;

a multi-scale pyramid network module 101 is constructed that allows the input prostate cancer ultrasound video data to be separated into different sub-components, each representing a different scale or hierarchy of information. This helps to focus on multiple levels of data simultaneously, from microscopic to macroscopic, to capture more detail and global information.

The 3D channel attention module 102 modulates the multi-scale feature vectors using the 3D channel level attention mechanism to ensure that the model is better able to adapt to different scale information. This typically involves weighting the different channel characteristics to focus on the most important information for cancer classification;

the semantic information dividing module 103 is used for processing the multi-scale feature data, including video frame data of a plurality of time steps and space information of each time step, and constructing space-time features;

the spatiotemporal attention module 104, which is shown in fig. 3, calculates the attention weights in the temporal and spatial dimensions by a three-dimensional convolution operation to determine which temporal steps and spatial regions are more critical to the classification of prostate cancer.

The multi-scale feature interaction fusion module 105 is configured to facilitate information interaction and fusion between different scale features, and generate a final output feature by accumulating and fusing the multi-scale features. This helps the model more fully understand the complex features of prostate cancer, thereby improving the classification performance of the model.

The cancer classification prediction module 106 accepts the fused features as input and outputs a probability distribution of the cancer classification.

The foregoing is merely a preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions easily conceivable by those skilled in the art within the technical scope of the present application should be covered in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. The pyramid algorithm-based prostate cancer ultrasonic video classification method is characterized by comprising the following steps of:

acquiring a space-time attention tensor based on the space-time features;

2. The pyramid algorithm-based ultrasound video classification method for prostate cancer of claim 1, wherein extracting multi-scale features of the ultrasound video for prostate cancer comprises:

3. The pyramid algorithm-based prostate cancer ultrasound video classification method according to claim 2, wherein the multi-scale pyramid network adopts an expansion-based 3DMBF-Net structure;

4. The pyramid algorithm-based ultrasound video classification method for prostate cancer of claim 1, wherein modulating the multi-scale feature vector based on the 3D channel level attention mechanism comprises:

5. The pyramid algorithm-based ultrasound video classification method of claim 1, wherein processing the multi-scale features based on the multi-scale feature vector, constructing spatio-temporal features comprises:

6. The pyramid algorithm-based ultrasound video classification method for prostate cancer of claim 1, wherein obtaining a spatiotemporal attention tensor based on the spatiotemporal features comprises:

7. The pyramid algorithm-based ultrasound video classification method of claim 1, wherein obtaining final output features based on the modulated multi-scale input feature tensor comprises:

8. A pyramid algorithm-based ultrasound video classification system for prostate cancer, for implementing the pyramid algorithm-based ultrasound video classification method of any one of claims 1-7, the system comprising: the system comprises a feature extraction module, a channel attention module, a semantic information dividing module, a space-time attention module, a multi-scale feature interaction fusion module and a classification module;