CN115272250B

CN115272250B - Method, apparatus, computer device and storage medium for determining focus position

Info

Publication number: CN115272250B
Application number: CN202210915636.3A
Authority: CN
Inventors: 梅立锋; 吕孟叶
Original assignee: Shenzhen Technology University
Current assignee: Shenzhen Technology University
Priority date: 2022-08-01
Filing date: 2022-08-01
Publication date: 2024-06-04
Anticipated expiration: 2042-08-01
Also published as: CN115272250A

Abstract

The application relates to a method, a device, a computer device and a storage medium for determining focus positions. The method comprises the following steps: inputting the magnetic resonance image into a detection model; the detection model comprises a backbone network, a feature fusion network and a decoupling head based on an anchor-free detection framework; feature extraction is carried out on the magnetic resonance image through each feature extraction module in the backbone network, so that feature images with different sizes are obtained; respectively carrying out semantic enhancement processing and position enhancement processing on feature images with corresponding sizes through the feature fusion network to obtain enhancement feature images with all sizes; classifying and regressing the enhancement feature graphs with various sizes based on the decoupling heads to obtain processing results corresponding to the enhancement feature graphs; and decoding each processing result, and determining the focus position in the magnetic resonance image based on the decoding result. By adopting the method, the focus position can be accurately determined.

Description

Method, apparatus, computer device and storage medium for determining focus position

Technical Field

The present application relates to the field of image processing, and in particular, to a method, apparatus, computer device, and storage medium for determining a lesion position.

Background

With the explosive growth of modern medicine, magnetic resonance imaging (Magnetic Resonance Imaging, MRI) is widely used as a non-invasive medical image analysis technique. Such medical imaging modalities are critical for a wide range of clinical diagnostic tasks including stroke, cancer, surgical planning, acute injury, etc., and magnetic resonance images generated from magnetic resonance imaging can be used to determine lesion locations.

The traditional method for determining the focus position is usually that a medical professional judges the magnetic resonance image according to accumulated experience, under the clinical environment, the diagnosis accuracy of a radiologist can only reach 64-70%, the state of the doctor can be greatly influenced along with the increase of workload, and in addition, in the magnetic resonance image, the reasons of different sizes, uneven gray scale, approximate gray scale and the like exist among different focuses, some focuses are easily ignored or misjudged in the diagnosis process, and the problem of inaccurate focus position determination inevitably exists.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a method, apparatus, computer device, and computer-readable storage medium for determining a lesion position that can improve the accuracy of the lesion position.

In a first aspect, the present application provides a method of determining a lesion location. The method comprises the following steps:

inputting the magnetic resonance image into a detection model; the detection model comprises a backbone network, a feature fusion network and a decoupling head based on an anchor-free detection framework;

Feature extraction is carried out on the magnetic resonance image through each feature extraction module in the backbone network, so that feature images with different sizes are obtained;

respectively carrying out semantic enhancement processing and position enhancement processing on feature images with corresponding sizes through the feature fusion network to obtain enhancement feature images with all sizes;

Classifying and regressing the enhancement feature graphs with various sizes based on the decoupling heads to obtain processing results corresponding to the enhancement feature graphs;

and decoding each processing result, and determining the focus position in the magnetic resonance image based on the decoding result.

In one embodiment, the feature extraction modules in the backbone network include a first feature extraction module, a second feature extraction module, a third feature extraction module, and a fourth feature extraction module;

the feature extraction of the magnetic resonance image by each feature extraction module in the backbone network, the obtaining of feature graphs with different sizes comprises:

extracting a feature map of a first size from the magnetic resonance image by the first feature extraction module;

After the feature images of the first size are subjected to image fusion based on the first feature extraction module, extracting a feature image of a second size from the fused feature images through the second feature extraction module;

after the feature images of the second size are subjected to image fusion based on the second feature extraction module, extracting a feature image of a third size from the fused feature images through the third feature extraction module;

And after the third feature extraction module is used for carrying out image fusion on the feature images with the third size, extracting a feature image with a fourth size from the fused feature images through the fourth feature extraction module.

In one embodiment, before the extracting, by the first feature extracting module, a feature map of a first size from the magnetic resonance image, the method further includes:

performing convolution embedding processing on the magnetic resonance image through the backbone network to obtain an embedded feature map;

the extracting, by the first feature extraction module, a feature map of a first size from the magnetic resonance image includes:

the embedded feature images are subjected to dynamic position embedding through a dynamic position embedding layer in the first feature extraction module, so that a first dynamic feature image is obtained;

carrying out local aggregation treatment on the first dynamic feature map through a multi-head correlation aggregation layer in the first feature extraction module to obtain a first aggregation feature map;

and inputting the first aggregate feature map into a feedforward network layer in the first feature extraction module to perform image block representation to obtain a feature map of a first size.

In one embodiment, after the extracting, by the first feature extracting module, the feature map of the first size from the magnetic resonance image, the method further includes:

Performing dimension reduction on the feature map of the first dimension to obtain a first dimension reduction feature map;

normalizing the first dimension reduction feature map to obtain a first normalized feature map;

The extracting, by the second feature extraction module, the feature map of the second size from the fused feature map includes:

And extracting a feature map of a second size from the first normalized feature map through the second feature extraction module.

In one embodiment, the feature fusion network includes a feature pyramid network and a path aggregation network, and the performing semantic enhancement processing and location enhancement processing on feature graphs of corresponding sizes through the feature fusion network to obtain enhanced feature graphs of each size includes:

respectively carrying out up-sampling treatment on the feature graphs with corresponding sizes through the feature pyramid network to obtain intermediate enhancement graphs with all sizes;

and respectively carrying out downsampling treatment on the intermediate enhancement graphs with corresponding sizes through the path aggregation network to obtain enhancement feature graphs with all sizes.

In one embodiment, the intermediate enhancement map of each size includes a first pyramid feature, a second pyramid feature, a third pyramid feature, and a fourth pyramid feature;

The step of respectively carrying out up-sampling processing on the feature graphs with corresponding sizes through the feature pyramid network to obtain intermediate enhancement graphs with all sizes comprises the following steps:

performing upsampling processing on the feature map with the fourth size through a first pyramid layer of the feature pyramid network to obtain first pyramid features;

Performing upsampling processing on the fusion features between the feature map of the third size and the first pyramid features through a second pyramid layer of the feature pyramid network to obtain second pyramid features;

Performing upsampling processing on the fusion features between the feature map of the second size and the second pyramid features through a third pyramid layer of the feature pyramid network to obtain third pyramid features;

and carrying out upsampling processing on the fusion features between the feature map of the first size and the third pyramid features through a fourth pyramid layer of the feature pyramid network to obtain fourth pyramid features.

In one embodiment, the enhanced feature map of each size includes a first path aggregation feature, a second path aggregation feature, a third path aggregation feature, and a fourth path aggregation feature; the step of respectively carrying out downsampling processing on the intermediate enhancement graphs with corresponding sizes through the path aggregation network to obtain enhancement feature graphs with all sizes comprises the following steps:

performing downsampling processing on the fourth pyramid feature through a first path aggregation layer of the path aggregation network to obtain the first path aggregation feature;

performing downsampling processing on the fusion features between the third pyramid features and the first path aggregation features through a second path aggregation layer of the path aggregation network to obtain the second path aggregation features;

Performing downsampling processing on the fusion feature between the second pyramid feature and the second path aggregation feature through a third path aggregation layer of the path aggregation network to obtain the third path aggregation feature;

And performing downsampling processing on the fusion characteristic between the first pyramid characteristic and the third path aggregation characteristic through a fourth path aggregation layer of the path aggregation network to obtain the fourth path aggregation characteristic.

In a second aspect, the present application also provides an apparatus for determining a location of a lesion. The device comprises:

The input module is used for inputting the magnetic resonance image into the detection model; the detection model comprises a backbone network, a feature fusion network and a decoupling head based on an anchor-free detection framework;

the feature extraction module is used for carrying out feature extraction on the magnetic resonance image through each feature extraction module in the backbone network to obtain feature images with different sizes;

The enhancement module is used for carrying out semantic enhancement processing and position enhancement processing on the feature images with corresponding sizes through the feature fusion network respectively to obtain enhancement feature images with all sizes;

The classification and regression module is used for performing classification and regression processing on the enhancement feature graphs with various sizes based on the decoupling heads to obtain processing results corresponding to the enhancement feature graphs;

and the decoding module is used for decoding each processing result and determining the focus position in the magnetic resonance image based on the decoding result.

The feature extraction module is further used for extracting a feature map of a first size from the magnetic resonance image through the first feature extraction module; after the feature images of the first size are subjected to image fusion based on the first feature extraction module, extracting a feature image of a second size from the fused feature images through the second feature extraction module; after the feature images of the second size are subjected to image fusion based on the second feature extraction module, extracting a feature image of a third size from the fused feature images through the third feature extraction module; and after the third feature extraction module is used for carrying out image fusion on the feature images with the third size, extracting a feature image with a fourth size from the fused feature images through the fourth feature extraction module.

The embodiment is characterized in that the feature extraction module is further configured to perform convolution embedding processing on the magnetic resonance image through the backbone network to obtain an embedded feature map; the embedded feature images are subjected to dynamic position embedding through a dynamic position embedding layer in the first feature extraction module, so that a first dynamic feature image is obtained; carrying out local aggregation treatment on the first dynamic feature map through a multi-head correlation aggregation layer in the first feature extraction module to obtain a first aggregation feature map; and inputting the first aggregate feature map into a feedforward network layer in the first feature extraction module to perform image block representation to obtain a feature map of a first size.

In one embodiment, the feature extraction module is further configured to reduce the dimension of the feature map of the first size to obtain a first dimension-reduced feature map; normalizing the first dimension reduction feature map to obtain a first normalized feature map; and extracting a feature map of a second size from the first normalized feature map through the second feature extraction module.

In one embodiment, the enhancement module is further configured to perform upsampling processing on feature graphs with corresponding sizes through the feature pyramid network, to obtain intermediate enhancement graphs with respective sizes; and respectively carrying out downsampling treatment on the intermediate enhancement graphs with corresponding sizes through the path aggregation network to obtain enhancement feature graphs with all sizes.

The enhancement module is further used for carrying out up-sampling processing on the feature map with the fourth size through the first pyramid layer of the feature pyramid network to obtain first pyramid features; performing upsampling processing on the fusion features between the feature map of the third size and the first pyramid features through a second pyramid layer of the feature pyramid network to obtain second pyramid features; performing upsampling processing on the fusion features between the feature map of the second size and the second pyramid features through a third pyramid layer of the feature pyramid network to obtain third pyramid features; and carrying out upsampling processing on the fusion features between the feature map of the first size and the third pyramid features through a fourth pyramid layer of the feature pyramid network to obtain fourth pyramid features.

In one embodiment, the enhanced feature map of each size includes a first path aggregation feature, a second path aggregation feature, a third path aggregation feature, and a fourth path aggregation feature; the enhancement module is further used for performing downsampling processing on the fourth pyramid feature through a first path aggregation layer of the path aggregation network to obtain the first path aggregation feature; performing downsampling processing on the fusion features between the third pyramid features and the first path aggregation features through a second path aggregation layer of the path aggregation network to obtain the second path aggregation features; performing downsampling processing on the fusion feature between the second pyramid feature and the second path aggregation feature through a third path aggregation layer of the path aggregation network to obtain the third path aggregation feature; and performing downsampling processing on the fusion characteristic between the first pyramid characteristic and the third path aggregation characteristic through a fourth path aggregation layer of the path aggregation network to obtain the fourth path aggregation characteristic.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of the above method when the processor executes the computer program.

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the above method.

In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of the above method.

The above method, apparatus, computer device and storage medium for determining lesion location by inputting a magnetic resonance image into a detection model; feature extraction is carried out on the magnetic resonance image through each feature extraction module in the backbone network, so that feature images with different sizes are obtained; respectively carrying out semantic enhancement processing and position enhancement processing on feature images with corresponding sizes through a feature fusion network to obtain enhancement feature images with all sizes; classifying and regressing the enhanced feature images with various sizes based on the decoupling heads to obtain processing results corresponding to the enhanced feature images; and decoding each processing result, and determining the focus position in the magnetic resonance image based on the decoding result. The flexible anchor-free detection framework can be well suitable for focus areas with different sizes and is efficient in processing, and furthermore, the focus positions can be accurately determined through feature extraction of a main network in a detection model, semantic enhancement and position enhancement of a feature fusion network, and classification and regression processing of decoupling heads based on the anchor-free detection framework.

Drawings

FIG. 1 is a diagram of an application environment for a method of determining a lesion location in one embodiment;

FIG. 2 is a flow chart of a method of determining a lesion location according to one embodiment;

FIG. 3 is a schematic diagram of a detection model in one embodiment;

FIG. 4 is a flow chart of a feature extraction step in one embodiment;

FIG. 5 is a flow diagram of the feature pyramid network processing steps in one embodiment;

FIG. 6 is a flow diagram of the processing steps of the path enhancement network in one embodiment;

FIG. 7 is a block diagram of an apparatus for determining a lesion position according to an embodiment;

fig. 8 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The method for determining the focus position provided by the embodiment of the application can be applied to an application environment shown in figure 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. The present application may be executed by the terminal 102 or the server 104, and this embodiment is described by taking the execution of the terminal 102 as an example.

The terminal 102 inputs the magnetic resonance image to the detection model; the detection model comprises a backbone network, a feature fusion network and a decoupling head based on an anchor-free detection framework; the terminal 102 performs feature extraction on the magnetic resonance image through each feature extraction module in the backbone network to obtain feature images with different sizes; the terminal 102 respectively carries out semantic enhancement processing and position enhancement processing on the feature images with corresponding sizes through a feature fusion network to obtain enhancement feature images with all sizes; the terminal 102 classifies and carries out regression processing on the enhancement feature graphs of all sizes based on the decoupling heads to obtain processing results corresponding to all the enhancement feature graphs; the terminal 102 performs a decoding process on each processing result, and determines a lesion position in the magnetic resonance image based on the decoding result.

The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.

In one embodiment, as shown in fig. 2, a method for determining a lesion location is provided, and the method is applied to the terminal 102 in fig. 1, for example, and includes the following steps:

S202, inputting a magnetic resonance image into a detection model; the detection model comprises a backbone network, a feature fusion network and a decoupling head based on an anchor-free detection framework.

Wherein the magnetic resonance image may refer to an image generated by a magnetic resonance imaging technique. The detection model may refer to a lesion detection model, which may be used to detect lesion locations in the magnetic resonance image. FIG. 3 is a schematic diagram of a detection model in one embodiment; as shown, the backbone network may include a convolution embedded layer and a feature extraction module, where the feature extraction module includes a first feature extraction module including a local correlation aggregation module and an image fusion module, a second feature extraction module including a hybrid correlation aggregation module and an image fusion module, a third feature extraction module including a global correlation aggregation module, and a fourth feature extraction module. The feature fusion network includes a feature pyramid network (Feature Pyramid Networks, FPN) and a path aggregation network (Path Aggregation Network, PAN).

The anchor-free detection framework may refer to YOLOX detection framework, YOLOX refers to an improved algorithm of YOLO (you only look once, only one eye) target detection algorithm, YOLOX constructs an anchor-free end-to-end target detection framework, and achieves a first-class detection level. The decoupling header based on the anchor-free detection framework may refer to a YOLOX-based decoupling header.

Specifically, the terminal inputs the magnetic resonance image to the detection model

S204, feature extraction is carried out on the magnetic resonance image through each feature extraction module in the backbone network, and feature graphs with different sizes are obtained.

The feature map may refer to a map input to the feature pyramid network by the backbone network, as shown in fig. 3, the feature map output by the three local correlation aggregation modules in the first feature extraction module may be a feature map of a first size, and the feature map output by the four local correlation aggregation modules in the second feature extraction module may be a feature map of a second size; the feature map output by the eight hybrid correlation aggregation modules in the third feature extraction module may be a feature map of a third size, and the feature map output by the three global correlation aggregation modules in the fourth feature extraction module may be a feature map of a fourth size.

S206, carrying out semantic enhancement processing and position enhancement processing on the feature images with corresponding sizes through the feature fusion network to obtain enhanced feature images with all sizes.

The feature fusion network comprises a feature pyramid network and a path aggregation network; enhanced feature graphs may refer to feature graphs that are output after processing via a feature pyramid network and a path aggregation network in a feature fusion network. The enhanced feature map for each size includes a first path aggregation feature, a second path aggregation feature, a third path aggregation feature, and a fourth path aggregation feature.

Specifically, the terminal can respectively perform up-sampling processing on the feature graphs with corresponding sizes through a feature pyramid network to obtain intermediate enhancement graphs with all sizes; and respectively carrying out downsampling treatment on the intermediate enhancement graphs with corresponding sizes through a path aggregation network to obtain enhancement feature graphs with all sizes.

The intermediate enhancement map may refer to feature maps that are output after feature maps of different sizes are processed through a feature pyramid network. The intermediate enhancement map for each size includes a first pyramid feature, a second pyramid feature, a third pyramid feature, and a fourth pyramid feature.

And S208, classifying and regressing the enhancement feature graphs with various sizes based on the decoupling heads to obtain processing results corresponding to the enhancement feature graphs.

The processing result may refer to a result obtained after the classification processing and the regression processing are performed on the enhanced feature map, respectively.

Specifically, as shown in fig. 3, the terminal may perform classification and regression processing on the first path aggregation feature, the second path aggregation feature, the third path aggregation feature, and the fourth path aggregation feature based on the decoupling head to obtain a first processing result corresponding to the first path aggregation feature, a second processing result corresponding to the second path aggregation feature, a third processing result corresponding to the third path aggregation feature, and a fourth processing result corresponding to the fourth path aggregation feature.

Wherein the first processing result, the second processing result, the third processing result and the fourth processing result refer to different processing results.

In one embodiment, after S208, the terminal may combine the processing results to obtain a combined result, then perform an array flattening operation, a splicing operation, and an array transposing operation on the combined result, and then perform a decoding process on the result after the array transposing.

S210, decoding each processing result, and determining the focus position in the magnetic resonance image based on the decoding result.

The focal position may refer to a position where a focal exists, among other things.

Specifically, the terminal decodes each processing result to obtain a decoded result, performs non-maximum suppression processing on the decoded result, and determines the focus position in the magnetic resonance graph based on the result after the non-maximum suppression processing.

In one embodiment, the decoding process may be converted into a corresponding prediction frame format, and may also refer to a coordinate reduction process of the processing result in the magnetic resonance image.

In one embodiment, the detection network training may be based on a target loss, ioU (cross-over ratio) loss and a classification loss, where the loss functions of the target loss and the classification loss are cross entropy loss, and finally, the model parameters are optimized by using an adaptive gradient method until convergence.

In the method for determining the focus position, the magnetic resonance image is input into a detection model; feature extraction is carried out on the magnetic resonance image through each feature extraction module in the backbone network, so that feature images with different sizes are obtained; respectively carrying out semantic enhancement processing and position enhancement processing on feature images with corresponding sizes through a feature fusion network to obtain enhancement feature images with all sizes; classifying and regressing the enhanced feature images with various sizes based on the decoupling heads to obtain processing results corresponding to the enhanced feature images; and decoding each processing result, and determining the focus position in the magnetic resonance image based on the decoding result. The flexible anchor-free detection framework can be well suitable for focus areas with different sizes and is efficient in processing, and furthermore, the focus positions can be accurately determined through feature extraction of a main network in a detection model, semantic enhancement and position enhancement of a feature fusion network, and classification and regression processing of decoupling heads based on the anchor-free detection framework.

In one embodiment, as shown in FIG. 4, the feature extraction step includes:

s402, extracting a feature map of a first size from the magnetic resonance image through a first feature extraction module.

In one embodiment, before S402, the terminal performs convolution embedding processing on the magnetic resonance image through a convolution embedding layer of the backbone network to obtain an embedding feature map.

The embedded feature map may refer to a feature map after performing convolution embedding processing. For example, as shown in fig. 3, the terminal may input the magnetic resonance image to a convolution embedding layer of the backbone network for convolution embedding processing, so as to obtain an embedding feature map.

Specifically, the terminal can perform dynamic position embedding on the embedded feature map through a dynamic position embedding layer in the first feature extraction module to obtain a first dynamic feature map; local aggregation processing is carried out on the first dynamic feature map through a multi-head correlation aggregation layer in the first feature extraction module, so that a first aggregation feature map is obtained; and inputting the first aggregate feature map into a feedforward network layer in a first feature extraction module to perform image block characterization to obtain a feature map of a first size.

The dynamic position embedding layer can be used for carrying out dynamic position embedding on the image, wherein the dynamic position embedding can be realized through zero-filled depth convolution position coding and is applicable to any resolution. The first dynamic feature map may refer to a feature map output after the dynamic position is embedded, and the first dynamic feature map, the second dynamic feature map, the third dynamic feature map, and the fourth dynamic feature map are different dynamic feature maps. The multi-head correlation aggregation layer can realize the context coding and the correlation learning of the image, and the multi-head refers to dividing the image block into a plurality of groups, and each head respectively processes the information of one group of channels. The first aggregate feature map may refer to a feature map output after the first dynamic feature map is locally aggregated. The feed-forward network layer may be used to further characterize the image.

S404, after the feature images of the first size are subjected to image fusion based on the first feature extraction module, the feature images of the second size are extracted from the fused feature images through the second feature extraction module.

In one embodiment, image fusion of feature maps of a first size based on a first feature extraction module includes: the terminal performs dimension reduction on the feature map of the first dimension to obtain a feature map of the first dimension reduction; normalizing the first dimension reduction feature map to obtain a first normalized feature map.

The first dimension-reduction feature map may refer to a feature map after dimension reduction of the feature map with the first dimension. The first normalized feature map may refer to a feature map after the first dimension reduction feature map is normalized.

In one embodiment, extracting, by the second feature extraction module, feature maps of the second size from the fused feature maps includes extracting, by the second feature extraction module, feature maps of the second size from the first normalized feature maps.

In one embodiment, extracting, by the second feature extraction module, the feature map of the second size from the first normalized feature map includes the terminal performing dynamic position embedding on the first normalized feature map by a dynamic position embedding layer in the second feature extraction module to obtain a second dynamic feature map; carrying out local aggregation treatment on the second dynamic feature map through a multi-head correlation aggregation layer in the second feature extraction module to obtain a second aggregation feature map; and inputting the second polymeric feature map into a feedforward network layer in a second feature extraction module to perform image block characterization to obtain a feature map of a second size.

The second aggregation feature map may refer to a feature map output after the second dynamic feature map is locally aggregated.

S406, after the feature images of the second size are subjected to image fusion based on the second feature extraction module, the feature images of the third size are extracted from the fused feature images through the third feature extraction module.

Specifically, image fusion is carried out on the feature map with the second size based on the second feature extraction module, so that a second normalized feature map is obtained; and through a third feature extraction module, performing mixed aggregation processing on the second normalized feature map according to the first preset attention mechanism to obtain a feature map with a third size.

The first preset Attention mechanism may refer to an Attention mechanism for a hybrid aggregation process, and the first preset Attention mechanism may be a Window Self-Attention mechanism (Window Self-Attention) or a global Self-Attention mechanism (Self-Attention), and the first preset Attention mechanism is a different Attention mechanism from the second preset Attention mechanism.

In one embodiment, performing image fusion on the feature map of the second size based on the second feature extraction module to obtain a second normalized feature map includes performing dimension reduction on the feature map of the second size by the terminal to obtain a second dimension reduction feature map; and normalizing the second dimension reduction feature map to obtain a second normalized feature map.

The second normalized feature map may refer to a feature map obtained by dimension reduction of a feature map of a second dimension.

In one embodiment, through a third feature extraction module, performing hybrid aggregation processing on the second normalized feature map according to a first preset attention mechanism, wherein obtaining the feature map of the third size includes that the terminal can perform dynamic position embedding on the second normalized feature map through a dynamic position embedding layer in the third feature extraction module to obtain a third dynamic feature map; performing mixed polymerization processing on the second dynamic feature map according to a first preset attention mechanism through a multi-head correlation polymerization layer in a third feature extraction module to obtain a third polymerization feature map; and inputting the third polymerization feature map into a feedforward network layer in a third feature extraction module to perform image block characterization to obtain a feature map of a third size.

S408, after the third feature extraction module performs image fusion on the feature map with the third size, a fourth feature extraction module extracts a feature map with a fourth size from the fused feature map.

Specifically, after image fusion is carried out on the feature images with the third size based on the third feature extraction module, the terminal obtains a third normalized feature image; and carrying out feature extraction on the third normalized feature map according to a second preset attention mechanism through a fourth feature extraction module to obtain a feature map of a fourth size.

The second preset attention mechanism may refer to an attention mechanism for global aggregation processing. The second preset attention mechanism may be a window self-attention mechanism or a global self-attention mechanism.

In one embodiment, after the image fusion is performed on the feature map of the third size based on the third feature extraction module, obtaining a third normalized feature map includes reducing the dimension of the feature map of the third size by the terminal, and obtaining a third dimension-reduced feature map; and normalizing the third dimension reduction feature map to obtain a third normalized feature map.

The third normalized feature map may refer to a feature map obtained by dimension reduction of a feature map of a third dimension.

In one embodiment, performing feature extraction on the third normalized feature map according to a second preset attention mechanism through a fourth feature extraction module to obtain a feature map of a fourth size, wherein the step of dynamically embedding the third normalized feature map through a dynamic position embedding layer in the fourth feature extraction module by the terminal to obtain a fourth dynamic feature map; performing mixed aggregation treatment on the fourth dynamic feature map according to a second preset attention mechanism through a multi-head correlation aggregation layer in the fourth feature extraction module to obtain a fourth aggregation feature map; and inputting the fourth aggregate feature map into a feedforward network layer in a fourth feature extraction module to perform image block characterization to obtain a feature map of a fourth size.

In this embodiment, a first feature extraction module is used to extract a feature map of a first size from a magnetic resonance image, a second feature extraction module is used to extract a feature map of a second size from the fused feature map after the first feature extraction module performs image fusion on the feature map of the first size, a third feature extraction module is used to extract a feature map of a third size from the fused feature map after the second feature extraction module performs image fusion on the feature map of the second size, and a fourth feature extraction module is used to extract a feature map of a fourth size from the fused feature map after the third feature extraction module performs image fusion on the feature map of the third size. The accurate extraction of the features can be realized.

In one embodiment, as shown in FIG. 5, the feature pyramid network processing steps include:

s502, performing up-sampling processing on the feature map with the fourth size through the first pyramid layer of the feature pyramid network to obtain first pyramid features.

Wherein the first pyramid layer may refer to a pyramid layer in the feature pyramid network for processing feature graphs of a fourth size.

S504, through a second pyramid layer of the feature pyramid network, up-sampling processing is carried out on the fusion features between the feature map of the third size and the first pyramid features, and the second pyramid features are obtained.

Wherein the second pyramid layer may refer to a pyramid layer in the feature pyramid network that is used to process blended features between the third-sized feature map and the first pyramid features.

S506, through a third pyramid layer of the feature pyramid network, up-sampling processing is carried out on the fusion features between the feature map of the second size and the second pyramid features, and the third pyramid features are obtained.

Wherein the third pyramid layer may refer to a pyramid layer in the feature pyramid network that is used to process blended features between the second-sized feature map and the second pyramid features.

S508, through a fourth pyramid layer of the feature pyramid network, up-sampling processing is carried out on the fusion features between the feature map of the first size and the third pyramid features, and fourth pyramid features are obtained.

Wherein the fourth pyramid layer may refer to a pyramid layer in the feature pyramid network that is used to process blended features between the second-sized feature map and the second pyramid features.

In this embodiment, the first pyramid layer, the second pyramid layer, the third pyramid layer and the fourth pyramid layer of the feature pyramid network perform upsampling processing on feature graphs with corresponding sizes and fusion features between the feature graphs with corresponding sizes and corresponding pyramid features to obtain corresponding pyramid features, so that semantic information and position information in the feature graphs with different sizes can be enhanced.

In one embodiment, as shown in fig. 6, the path enhanced network processing steps include:

s602, performing downsampling processing on the fourth pyramid feature through a first path aggregation layer of the path aggregation network to obtain a first path aggregation feature.

The first path aggregation layer may refer to a path aggregation layer for processing the fourth pyramid feature in the path aggregation network.

S604, through a second path aggregation layer of the path aggregation network, downsampling is carried out on the fusion feature between the third pyramid feature and the first path aggregation feature, so as to obtain a second path aggregation feature.

Wherein the second path aggregation layer may refer to a path aggregation layer in the path aggregation network for processing a fusion feature between the third pyramid feature and the first path aggregation feature.

S606, through a third path aggregation layer of the path aggregation network, downsampling is conducted on the fusion feature between the second pyramid feature and the second path aggregation feature, and the third path aggregation feature is obtained.

Wherein the third path aggregation layer may refer to a path aggregation layer in the path aggregation network for processing the fusion feature between the second pyramid feature and the second path aggregation feature.

S608, through a fourth path aggregation layer of the path aggregation network, downsampling is conducted on the fusion feature between the first pyramid feature and the third path aggregation feature, and the fourth path aggregation feature is obtained.

The fourth path aggregation layer may refer to a path aggregation layer in the path aggregation network for processing a fusion feature between the first pyramid feature and the third path aggregation feature.

In this embodiment, the first path aggregation layer, the second path aggregation layer, the third path aggregation layer and the fourth path aggregation layer of the path aggregation network perform downsampling processing on the corresponding pyramid features and the fusion features between the corresponding pyramid features and the corresponding path aggregation features to obtain the corresponding path aggregation features, so that semantic information and position information in feature graphs with different sizes can be enhanced.

As an example, the present embodiment is as follows:

the convolution embedded layer implementation method of the backbone network comprises the following steps: in two-dimensional images (magnetic resonance images) As input, a learning image block embedding function f (·) is designed for the model, so that x is taken as input to obtain image characteristicsF (·) is a two-dimensional convolution operation with a convolution kernel of size k×k, step s, and padding p. The number of channels of the image feature f (x) is the embedding dimension Dim, and the height and width are respectively: /(I) F (x) is then flattened into a sequence of image blocks of hwxdim and normalized by a layer normalization method (LayerNorm) for input into a subsequent module.

The correlation aggregation module of the backbone network consists of three key components: a dynamic location embedding (Dynamic Position Embedding, DPE) layer, a Multi-headed correlation aggregation (MHRA) layer, a Feed-Forward Network (FFN) layer. For input X _in∈R^C×H×W, firstly, dynamic position embedding is introduced to dynamically integrate position information into the image block sequence obtained above, the dynamic position embedding generally uses zero-filled depth convolution position coding, the method is suitable for arbitrary input resolution, and can make full use of feature sequences to perform better visual recognition, and the expression is as follows: x=dpe (X _in)+X_in; context coding and dependency learning of image blocks is then achieved using multi-headed dependency aggregation, which refers to dividing the image blocks into groups, each head processing information of a set of channels separately, local dependency aggregation is usually achieved using a large convolution kernel, global dependency aggregation is usually achieved using a self-attention mechanism expressed as y=mhra (Norm (X)) +x; finally, similar to a transform structure, we add a feed forward network to further characterize the image blocks expressed as z=ffn (Norm (Y)) +y.

The image fusion module can be realized by using a convolution layer or a full connection layer, and the function of the image fusion module is dimension reduction of image characteristics.

FIG. 3 is a schematic diagram of a detection model in one embodiment; as shown in fig. 3, an MRI image of size h×w is input, first through a convolution embedding layer, a convolution of kernel size 4×4, step size 4×4, and output channel number 64 is used to dynamically position embed the image, and a regularized LN layer immediately following the convolution normalizes the data. The convolution embedded layer outputs to obtain a characteristic diagram of H/4×W/4×64, and then sequentially inputs the characteristic diagrams to three local correlation aggregation modules to perform local correlation aggregation on the characteristics, wherein the output characteristic diagram is marked as F1, and the core is to perform local correlation aggregation by using convolution with a kernel size of 5×5, padding of 2 and groups of 64. The image fusion modules all use convolution with kernel size and step size of 2 x 2. F1 is firstly input into an image fusion module to perform feature downsampling to obtain a feature map of H/8 xW/8 x 128, and a regularized LN layer is connected after downsampling convolution. The step is similar to the previous step, the feature map obtained by downsampling is sequentially input into four local correlation aggregation modules to obtain a feature map which is marked as F2, and the local correlation aggregation module of the layer uses convolution with the kernel size of 5 multiplied by 5, the padding of 2 and the groups of 128 to perform local correlation aggregation. And then F2 performs feature downsampling in the image block fusion module to obtain a feature map of H/16 XW/16X 320, sequentially inputs the feature map into 8 mixed correlation aggregation modules, performs global feature modeling, and the output feature map is marked as F3. The 8 hybrid modules are divided into 2 groups, each group contains 4 modules, the first 3 are Window Self-Attention (Window Self-Attention) modules, and the last 1 are Global Self-Attention (Global Self-Attention) modules. Finally, the step of obtaining the product, f3 is subjected to feature downsampling to obtain a 4 x 512 feature map, sequentially inputting the three global correlation aggregation modules, and marking an output result as F4. Then, 4 feature graphs F1, F2, F3 and F4 with different scales are obtained through a backbone network, and are input into a feature fusion network in a YOLOX detection framework, firstly, high-level feature information is transmitted into and fused with low-level features in a top-down mode in a feature pyramid network, then the features are enhanced through a path aggregation network from bottom to top, and the whole feature hierarchy structure is enhanced by using accurate positioning information in the low-level features, so that an information path between the low-level features and the high-level features is shortened.

The characteristics output by the path aggregation layer of each layer are input into a decoupling head for classification and regression, specifically, each decoupling head adopts a1×1 convolution layer, the channels of the input characteristics are reduced to 256, two parallel branches are added, each branch is provided with two 3×3 convolution layers for classification and regression tasks respectively, and IoU (cross-over) branches are added on the regression branches to assist training. And combining the characteristics of each layer, and splicing the classification and regression results according to the channel dimension.

The feature map is then flattened into a vector using an array flattening operation, changing from (batch size, channel, height, width) to (batch size, channel, height by width), where height by width is the number of anchors predicted for this scale feature. And then, performing splicing operation according to a third dimension, merging results under different scales, and changing (batch size, channel, height×width) into (batch size, channel, anchors). This is followed by an array transpose operation (channel, anchors) transpose into (channel, size) groups, where each row is an Anchor information for one image prediction. And finally, decoding the array, namely converting the array into a corresponding prediction frame format, namely simply adding the coordinate of the grid to the coordinate of the left upper corner offset of the prediction result relative to the grid in the decoding process, and multiplying the coordinate by the downsampling multiple to restore the original image coordinate.

It should be understood that, although the steps in the flowcharts related to the above embodiments are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a focus position determining device for realizing the focus position determining method. The implementation of the solution provided by the device is similar to that described in the above method, so specific limitations in one or more embodiments of the device for determining a focal position provided below may be referred to above as limitations on the method for determining a focal position, and will not be described in detail herein.

In one embodiment, as shown in fig. 7, there is provided an apparatus for determining a lesion position, comprising: an input module 702, a feature extraction module 704, an enhancement module 706, a classification and regression module 708, and a decoding module 710, wherein:

an input module 702 for inputting a magnetic resonance image into the detection model; the detection model comprises a backbone network, a feature fusion network and a decoupling head based on an anchor-free detection framework;

the feature extraction module 704 is configured to perform feature extraction on the magnetic resonance image through each feature extraction module in the backbone network, so as to obtain feature graphs with different sizes;

the enhancement module 706 is configured to perform semantic enhancement processing and location enhancement processing on feature graphs with corresponding sizes through a feature fusion network, to obtain enhanced feature graphs with each size;

The classification and regression module 708 is configured to perform classification and regression processing on the enhancement feature graphs of each size based on the decoupling head, so as to obtain a processing result corresponding to each enhancement feature graph;

A decoding module 710, configured to perform decoding processing on each processing result, and determine a focus position in the magnetic resonance image based on the decoding result.

In one embodiment, the feature extraction modules in the backbone network include a first feature extraction module, a second feature extraction module, a third feature extraction module, and a fourth feature extraction module; the feature extraction module 704 is further configured to extract, by the first feature extraction module, a feature map of a first size from the magnetic resonance image; after the feature images of the first size are subjected to image fusion based on the first feature extraction module, extracting a feature image of a second size from the fused feature images through the second feature extraction module; after the feature images of the second size are subjected to image fusion based on the second feature extraction module, extracting a feature image of a third size from the fused feature images through a third feature extraction module; and after the image fusion is carried out on the feature images with the third size based on the third feature extraction module, extracting the feature images with the fourth size from the fused feature images through the fourth feature extraction module.

In one embodiment, the feature extraction module 704 is further configured to perform convolution embedding processing on the magnetic resonance image through the backbone network to obtain an embedded feature map; the dynamic position embedding layer in the first feature extraction module is used for carrying out dynamic position embedding on the embedded feature map to obtain a first dynamic feature map; local aggregation processing is carried out on the first dynamic feature map through a multi-head correlation aggregation layer in the first feature extraction module, so that a first aggregation feature map is obtained; and inputting the first aggregate feature map into a feedforward network layer in a first feature extraction module to perform image block characterization to obtain a feature map of a first size.

In one embodiment, the feature extraction module 704 is further configured to perform dimension reduction on the feature map with the first size to obtain a first dimension-reduced feature map; normalizing the first dimension reduction feature map to obtain a first normalized feature map; and extracting the feature map of the second size from the first normalized feature map through a second feature extraction module.

In one embodiment, the feature extraction module 704 is further configured to perform image fusion on the feature map of the second size based on the second feature extraction module to obtain a second normalized feature map; through a third feature extraction module, performing mixed aggregation processing on the second normalized feature map according to a first preset attention mechanism to obtain a feature map with a third size; after the image fusion is carried out on the feature map with the third size based on the third feature extraction module, a third normalized feature map is obtained; and carrying out feature extraction on the third normalized feature map according to a second preset attention mechanism through a fourth feature extraction module to obtain a feature map of a fourth size.

In one embodiment, the enhancement module 706 is further configured to perform upsampling processing on feature graphs with corresponding sizes through the feature pyramid network, to obtain intermediate enhancement graphs with respective sizes; and respectively carrying out downsampling treatment on the intermediate enhancement graphs with corresponding sizes through a path aggregation network to obtain enhancement feature graphs with all sizes.

In one embodiment, the intermediate enhancement map for each size includes a first pyramid feature, a second pyramid feature, a third pyramid feature, and a fourth pyramid feature; enhancement module 706 is further configured to perform upsampling processing on the feature map of the fourth size through the first pyramid layer of the feature pyramid network to obtain a first pyramid feature; performing upsampling processing on the fusion features between the feature map of the third size and the first pyramid features through a second pyramid layer of the feature pyramid network to obtain second pyramid features; up-sampling the fusion features between the feature map of the second size and the second pyramid features through a third pyramid layer of the feature pyramid network to obtain third pyramid features; and carrying out up-sampling processing on the fusion features between the feature map of the first size and the third pyramid features through a fourth pyramid layer of the feature pyramid network to obtain fourth pyramid features.

In one embodiment, the enhanced feature map for each size includes a first path aggregation feature, a second path aggregation feature, a third path aggregation feature, and a fourth path aggregation feature; enhancement module 706 is further configured to perform downsampling on the fourth pyramid feature through the first path aggregation layer of the path aggregation network to obtain a first path aggregation feature; the method comprises the steps that through a second path aggregation layer of a path aggregation network, downsampling is conducted on fusion features between third pyramid features and first path aggregation features, and second path aggregation features are obtained; the method comprises the steps that through a third path aggregation layer of a path aggregation network, downsampling is conducted on fusion features between second pyramid features and second path aggregation features, and third path aggregation features are obtained; and performing downsampling processing on the fusion characteristic between the first pyramid characteristic and the third path aggregation characteristic through a fourth path aggregation layer of the path aggregation network to obtain a fourth path aggregation characteristic.

The above embodiment, by inputting the magnetic resonance image to the detection model; feature extraction is carried out on the magnetic resonance image through each feature extraction module in the backbone network, so that feature images with different sizes are obtained; respectively carrying out semantic enhancement processing and position enhancement processing on feature images with corresponding sizes through a feature fusion network to obtain enhancement feature images with all sizes; classifying and regressing the enhanced feature images with various sizes based on the decoupling heads to obtain processing results corresponding to the enhanced feature images; and decoding each processing result, and determining the focus position in the magnetic resonance image based on the decoding result. The flexible anchor-free detection framework can be well suitable for focus areas with different sizes and is efficient in processing, and furthermore, the focus positions can be accurately determined through feature extraction of a main network in a detection model, semantic enhancement and position enhancement of a feature fusion network, and classification and regression processing of decoupling heads based on the anchor-free detection framework.

The various modules in the above-described means for determining a lesion location may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device, which may be a terminal or a server, is provided, and the embodiment uses the computer device as an example of the terminal, and the internal structure diagram may be shown in fig. 8. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of determining a lesion location. The display unit of the computer equipment is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device, wherein the display screen can be a liquid crystal display screen or an electronic ink display screen, the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on a shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in FIG. 8 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer device is provided that includes a memory having a computer program stored therein and a processor that implements the above embodiments when the processor executes the computer program.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the above embodiments.

In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, implements the embodiments described above.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magneto-resistive random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (PHASE CHANGE Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A method of determining a lesion location, the method comprising:

Feature extraction is carried out on the magnetic resonance image through each feature extraction module in the backbone network, so that feature images with different sizes are obtained; the feature extraction module in the backbone network comprises a first feature extraction module, a second feature extraction module, a third feature extraction module and a fourth feature extraction module; the first feature extraction module comprises a local correlation aggregation module and an image fusion module, the second feature extraction module comprises a local correlation aggregation module and an image fusion module, the third feature extraction module comprises a mixed correlation aggregation module and an image fusion module, and the fourth feature extraction module comprises a global correlation aggregation module;

After the third feature extraction module is used for carrying out image fusion on the feature images with the third size, extracting a feature image with a fourth size from the fused feature images through the fourth feature extraction module;

Before the extracting, by the first feature extracting module, a feature map of a first size from the magnetic resonance image, the method further includes:

The extracting, by the first feature extracting module, a feature map of a first size from the magnetic resonance image includes:

The embedded feature images are subjected to dynamic position embedding through a dynamic position embedding layer in the first feature extraction module, so that a first dynamic feature image is obtained; the dynamic position embedding layer is used for carrying out dynamic position embedding on the image, wherein the dynamic position embedding is realized through zero-padding depth convolution position coding;

carrying out local aggregation treatment on the first dynamic feature map through a multi-head correlation aggregation layer in the first feature extraction module to obtain a first aggregation feature map; the multi-head correlation aggregation layer realizes the context coding and correlation learning of the image, wherein multi-head refers to dividing the image block into a plurality of groups, and each head respectively processes the information of a group of channels;

inputting the first aggregate feature map into a feedforward network layer in the first feature extraction module to perform image block representation to obtain a feature map of the first size;

2. The method of claim 1, wherein after extracting the feature map of the first size from the magnetic resonance image by the first feature extraction module, the method further comprises:

3. The method according to claim 1, wherein the feature fusion network includes a feature pyramid network and a path aggregation network, the performing semantic enhancement processing and location enhancement processing on feature graphs of corresponding sizes through the feature fusion network, respectively, to obtain enhanced feature graphs of each size includes:

4. A method according to claim 3, wherein the intermediate enhancement map for each size includes a first pyramid feature, a second pyramid feature, a third pyramid feature, and a fourth pyramid feature;

5. The method of claim 4, wherein the enhanced features map of each size comprises a first path aggregation feature, a second path aggregation feature, a third path aggregation feature, and a fourth path aggregation feature; the step of respectively carrying out downsampling processing on the intermediate enhancement graphs with corresponding sizes through the path aggregation network to obtain enhancement feature graphs with all sizes comprises the following steps:

6. An apparatus for determining a lesion location, the apparatus comprising:

The feature extraction module in the backbone network comprises a first feature extraction module, a second feature extraction module, a third feature extraction module and a fourth feature extraction module; the first feature extraction module comprises a local correlation aggregation module and an image fusion module, the second feature extraction module comprises a local correlation aggregation module and an image fusion module, the third feature extraction module comprises a mixed correlation aggregation module and an image fusion module, and the fourth feature extraction module comprises a global correlation aggregation module;

Before the extracting, by the first feature extracting module, the feature map of the first size from the magnetic resonance image, the method further includes:

7. The apparatus of claim 6, wherein after extracting the feature map of the first size from the magnetic resonance image by the first feature extraction module, the apparatus further comprises:

8. The apparatus of claim 6, wherein the feature fusion network comprises a feature pyramid network and a path aggregation network, the performing semantic enhancement processing and location enhancement processing on feature graphs of corresponding sizes through the feature fusion network, respectively, to obtain enhanced feature graphs of each size comprises:

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 5 when the computer program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 5.