CN114648658A

CN114648658A - Training method, image processing method, related device, equipment and storage medium

Info

Publication number: CN114648658A
Application number: CN202210179661.XA
Authority: CN
Inventors: 胡志强; 刘子豪; 李卓威
Original assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Priority date: 2022-02-25
Filing date: 2022-02-25
Publication date: 2022-06-21

Abstract

The application discloses a training method, an image processing method, a related device, equipment and a storage medium, wherein the training method comprises the following steps: carrying out feature extraction on the sample image by using a feature extraction network to obtain a sample feature map of the sample image; determining feature similarity of a plurality of groups of region combinations by using a sample feature map, wherein the sample image comprises a plurality of local regions, and the feature similarity of each group of region combinations represents the result of at least two local regions included in the region combinations; determining reference relation parameters corresponding to each group of area combinations based on the labeling information of each group of area combinations, wherein each reference relation parameter represents the actual difference condition between local areas in the area combinations; and adjusting the network parameters of the feature extraction network by using the feature similarity and the reference relation parameters of each group of regional combinations. By the method, the extraction accuracy of the same image information by the feature extraction network is improved.

Description

Training method, image processing method, related device, equipment and storage medium

Technical Field

The present application relates to the field of deep learning technologies, and in particular, to a training method, an image processing method, and a related apparatus, device, and storage medium.

Background

The rapid development of deep learning, the increasing improvement of neural network algorithm, and the corresponding improvement of computational accuracy. The good performance of neural networks does not leave effective training for neural networks.

Currently, contrast Learning (contrast Learning) is a method for training a feature extraction network of a neural network. The comparison learning is to compare the similarity of feature maps obtained by feature extraction of different images, so as to further realize the training of the feature extraction network. The contrast learning training method can only use different images for training, thereby limiting the further application of the contrast learning training method and limiting the training effect of the contrast learning.

Therefore, how to improve the contrast learning training method has important meaning for improving the training effect of the contrast learning.

Disclosure of Invention

The application at least provides a training method, an image processing method, a related device, equipment and a storage medium.

A first aspect of the present application provides a method for training a feature extraction network, where the method includes: carrying out feature extraction on the sample image by using a feature extraction network to obtain a sample feature map of the sample image; determining feature similarity of a plurality of groups of region combinations by using a sample feature map, wherein the sample image comprises a plurality of local regions, and the feature similarity of each group of region combinations represents the result of at least two local regions included in the region combinations; determining reference relation parameters corresponding to each group of area combinations based on the labeling information of each group of area combinations, wherein each reference relation parameter represents the actual difference condition between local areas in the area combinations; and adjusting the network parameters of the feature extraction network by using the feature similarity and the reference relation parameters of each group of regional combinations.

Therefore, by combining the reference relation parameters and the similarity corresponding to the region combination, the feature similarity is corrected, so that the similarity conditions of different regions in the same image can be compared subsequently, the feature extraction network can be trained by using the contrast learning method, the contrast learning is applied to one image, the application range of the contrast learning training method is expanded, and the extraction accuracy of the feature extraction network on the same image information is improved. In addition, the application of contrast learning on one image is realized, so that the method is more applicable to the task of image segmentation compared with the contrast learning by using different images, and the method is favorable for improving the image processing accuracy of the image processing model.

The marking information of the area combination comprises preset classifications to which pixel points in each local area of the area combination belong; the determining the reference system parameter of each group of area combination based on the labeling information of each group of area combination includes: respectively taking each area combination as a target area combination; respectively determining category parameters of each target local area about each preset classification based on the labeling information of the target area combination, wherein the target local area is a local area in the target area combination, and the category parameters about the preset classification represent the pixel point conditions belonging to the preset classification in the target local area; and obtaining a reference relation parameter of the target area combination based on the category parameter.

Therefore, the reference relation parameters of the target area combination can be obtained by respectively determining the category parameters of each target local area about each preset classification based on the labeling information of the target area combination, and the reference relation parameters are determined by using the labeling information.

The determining, based on the labeling information of the target region combination, the category parameter of each target local region with respect to each preset classification includes: respectively taking each preset classification as a target classification; for each target local area, counting the number of pixel points belonging to the target classification in the target local area, and determining the category parameter of the target local area belonging to the target classification based on the number of pixel points belonging to the target classification.

Therefore, the number of the pixel points belonging to the target classification in the target local area is counted, so that the category parameter of the target local area belonging to the target classification can be determined based on the number of the pixel points belonging to the target classification subsequently, and the determination of the category parameter by using the labeling information is realized.

The obtaining of the reference relationship parameter of the target area combination by using the category parameter includes: for each preset classification, obtaining the classification parameter difference of the target area combination relative to the preset classification based on the classification parameter of each target local area belonging to the preset classification; and obtaining a reference relation parameter of the target area combination based on the category parameter difference of the target area combination about each preset classification.

Therefore, by obtaining the category parameter difference of the target area combination with respect to the preset classifications, the reference relationship parameter of the target area combination can be obtained based on the category parameter difference of the target area combination with respect to each preset classification, so that the reference relationship parameter can reflect the actual difference situation between the local areas in the target area combination.

The determining of the category parameter of the target local area belonging to the target classification based on the number of the pixel points belonging to the target classification includes: and taking the ratio of the number of the pixel points belonging to the target classification to the total number of the pixel points of the target local area as a class parameter of the target local area belonging to the target classification.

Therefore, the ratio of the number of the pixel points belonging to the target classification to the total number of the pixel points of the target local area is used as the category parameter of the target local area belonging to the target classification, so that the category parameter is obtained.

The obtaining of the category parameter difference of the target area combination with respect to the preset classification based on the category parameter of each target local area belonging to the preset classification includes: and taking the difference of the class parameters belonging to the preset classification between the target local areas as the class parameter difference of the target area combination relative to the preset classification.

Thus, determining the category parameter difference of the target region combination with respect to the preset classification using the category parameter is achieved by combining the category parameter difference with respect to the preset classification using the difference between the category parameters belonging to the preset classification between the target local regions as the target region.

The obtaining of the reference relationship parameter of the target area combination based on the category parameter difference of the target area combination with respect to each preset classification includes: obtaining a statistic value of the category parameter difference of the target area combination about each preset classification; and obtaining the reference relation parameter of the target area combination by using the statistical value of the target area combination, wherein the reference relation parameter of the target area combination is in negative correlation with the statistical value of the target area combination.

Therefore, by obtaining the statistical value of the target area combination about the category parameter difference of each preset classification and obtaining the reference relation parameter of the target area combination according to the statistical value of the target area combination, the reference relation parameter for the actual difference situation between the local areas in the area combination can be obtained

The adjusting the network parameters of the feature extraction network by using the feature similarity and the reference relationship parameters of each group of regional combinations includes: respectively combining all the groups of area combinations into a target area combination, adjusting the feature similarity of the target area combination by using the reference relation parameters of the target area combination to obtain the reference feature similarity of the target area combination, and obtaining the first loss of the target area combination based on the reference feature similarity of the target area combination; obtaining a second loss of the feature extraction network based on the first loss of each group of regional combinations; and adjusting the network parameters of the feature extraction network by using the second loss.

Therefore, the feature similarity and the assistant relationship parameter of at least one assistant region combination are used to correspondingly obtain the assistant feature similarity of each assistant region combination, and further, the first loss corresponding to the target region combination can be obtained based on the reference feature similarity and the assistant feature similarity, so that the loss of the feature extraction network in the aspect of feature similarity when the same image information is subjected to feature extraction is obtained. In addition, the first loss of the target area combination is obtained by utilizing the reference feature similarity of the target area combination, and then the second loss of the feature extraction network on the aspect of the overall similarity of the sample feature map can be obtained on the basis of the first loss of each group of area combination, so that the network parameters of the feature extraction network can be adjusted by utilizing the second loss, the method for applying the contrast learning on one image is realized, and the training of the feature extraction network is completed.

The above adjusting the feature similarity of the target area combination by using the reference relationship parameter of the target area combination to obtain the reference feature similarity of the target area combination includes: taking the product of the reference relation parameters of the target area combination and the feature similarity of the target area combination as the reference feature similarity of the target area combination; the obtaining of the first loss of the target area combination based on the similarity of the reference features of the target area combination includes: the method comprises the steps that the feature similarity and auxiliary relation parameters of at least one auxiliary area combination are utilized to correspondingly obtain the auxiliary feature similarity of each auxiliary area combination, wherein the auxiliary area combination and a target area combination have at least one same local area, and the auxiliary relation parameters of the auxiliary area combination are obtained on the basis of the reference relation parameters of the auxiliary area combination; and obtaining a first loss corresponding to the target area combination based on the reference feature similarity and the auxiliary feature similarity.

Therefore, the product of the reference relation parameter of the target area combination and the feature similarity of the target area combination is used as the reference feature similarity of the target area combination, so that the feature similarity is processed by using the reference relation parameter, and the obtained reference feature similarity can more accurately reflect the feature similarity between the target local areas in the target area combination. In addition, by determining the auxiliary region combination, the comparative learning can be performed in combination with the auxiliary region combination.

Wherein, one local area in the target area combination is a reference area, and the auxiliary area combination is an area combination containing the reference area.

The sum of the auxiliary relationship parameter of the auxiliary area combination and the reference relationship parameter of the auxiliary area combination is a preset value.

Therefore, the sum of the auxiliary relationship parameter of the auxiliary region combination and the reference relationship parameter of the auxiliary region combination is a preset value, so that the auxiliary relationship parameter of the auxiliary relationship parameter and the reference relationship parameter are in a negative correlation relationship, and the auxiliary relationship parameter and the reference relationship parameter of the auxiliary region combination can be used for representing the same aspect and the different aspect of each target local region in the target region combination with respect to all preset classifications. When the reference relation parameter is used to represent the same aspect of each target local area in the target area combination with respect to all preset classifications, the auxiliary relation parameter can represent different aspects of each target local area in the target area combination with respect to all preset classifications.

The obtaining of the first loss corresponding to the target area combination based on the reference feature similarity and the assistant feature similarity includes: performing preset operation on the reference feature similarity to obtain a first operation result; respectively carrying out preset operation on the auxiliary feature similarity of each auxiliary area combination to obtain a second operation result corresponding to each auxiliary area combination; and obtaining a first loss corresponding to the target area combination based on the ratio of the first operation result to the sum of the second operation results corresponding to the auxiliary area combinations.

The feature extraction network is a part of an image processing model, and the image processing model is used for predicting a sample image based on a sample feature map of the sample image.

The training method of the feature extraction network is executed in a pre-training stage of the image processing model.

Therefore, the training method of the feature extraction network is applied to the pre-training stage of the image processing model, and is beneficial to improving the pre-training effect, so that the training effect of the subsequent image processing model is improved.

The reference relation parameter of the area combination is in positive correlation with the actual class similarity between the local areas in the area combination.

Therefore, by setting the reference relationship parameter of the area combination to be in positive correlation with the actual class similarity between the local areas in the area combination, the actual class similarity between the local areas in the area combination can be intuitively reflected by the reference relationship parameter.

Wherein, every two different local areas in the sample image form a group of area combinations.

Therefore, the feature similarity of two local regions in the sample image can be obtained by combining every two different local regions in the sample image into a group of region combinations.

The determining the feature similarity of the combination of the plurality of groups of regions by using the sample feature map includes: for each group of area combination, acquiring feature information corresponding to each local area in the area combination from the sample feature map; and obtaining the feature similarity of the area combination by using the feature information corresponding to each local area in the area combination.

Therefore, by acquiring the feature information corresponding to each local region in the region combination, the feature similarity of the region combination can be correspondingly obtained.

A second aspect of the present application provides a training method for an image processing model, the method including: performing the method described in the first aspect, pre-training a feature extraction network of the image processing model; performing image processing on the sample image by using the image processing model to obtain a detection result of the sample image, wherein the image processing comprises performing feature extraction on the sample image by using a feature extraction network to obtain an original feature map; performing feature optimization on corresponding feature areas in an original feature map by using uncertainty parameters corresponding to the feature areas in the original feature map to obtain a target feature map, wherein in the target feature map, the influence of target feature information corresponding to the feature areas is related to the uncertainty parameters corresponding to the feature areas; obtaining a detection result of the sample image based on the target characteristic diagram; and adjusting the network parameters of the image processing model based on the detection result.

Therefore, the feature extraction network of the image processing model is pre-trained, so that the accuracy of extracting the same image information on one image by the feature extraction network can be improved, and the training progress can be accelerated. In addition, the uncertainty parameters corresponding to the characteristic regions in the original characteristic diagram are utilized to perform characteristic optimization on the corresponding characteristic regions in the original characteristic diagram, so that the influence on the characteristic information of the characteristic regions is distinguished, the robustness of pixel points with high uncertainty in the sample image is improved, and the accuracy of the detection result of the sample image is improved.

A third aspect of the present application provides an image processing method, including: acquiring a target image; and processing the target image by using an image processing model to obtain a detection result of the target image, wherein the image processing model is obtained by training by using the training method described in the second aspect.

The fourth aspect of the present application provides a training apparatus for a feature extraction network, the apparatus comprising: the device comprises an acquisition module, a first determination module, a second determination module and an adjustment module, wherein the acquisition module is used for performing feature extraction on a sample image by using a feature extraction network to obtain a sample feature map of the sample image; the first determining module is used for determining the feature similarity of a plurality of groups of region combinations by using a sample feature map, wherein the sample image comprises a plurality of local regions, and the feature similarity of each group of region combinations represents that the feature similarity of at least two local regions included in the region combinations is obtained; the second determining module is used for determining reference relation parameters corresponding to each group of area combinations based on the labeling information of each group of area combinations, wherein each reference relation parameter represents the actual difference condition between local areas in the area combinations; the adjusting module is used for adjusting the network parameters of the feature extraction network by using the feature similarity and the reference relation parameters of each group of regional combination.

A fifth aspect of the present application provides an apparatus for training an image processing model, the apparatus comprising: the image processing system comprises a pre-training module, a detection module and an adjustment module, wherein the pre-training module is used for executing the method described in the first aspect and pre-training a feature extraction network of an image processing model; the detection module is used for carrying out image processing on the sample image by using the image processing model to obtain a detection result of the sample image, wherein the image processing comprises carrying out feature extraction on the sample image by using a feature extraction network to obtain an original feature map; performing feature optimization on corresponding feature areas in an original feature map by using uncertainty parameters corresponding to the feature areas in the original feature map to obtain a target feature map, wherein in the target feature map, the influence of target feature information corresponding to the feature areas is related to the uncertainty parameters corresponding to the feature areas; obtaining a detection result of the sample image based on the target feature map; and the adjusting module is used for adjusting the network parameters of the image processing model based on the detection result.

A sixth aspect of the present application provides an image processing apparatus comprising: the system comprises an acquisition module and a detection module, wherein the acquisition module is used for acquiring a target image; the detection module is configured to process the target image by using an image processing model to obtain a detection result of the target image, where the image processing model is obtained by training using the method described in the second aspect.

A seventh aspect of the present application provides an electronic device, which includes a memory and a processor coupled to each other, where the processor is configured to execute program instructions stored in the memory to implement the image processing model training method in the first aspect.

An eighth aspect of the present application provides a computer-readable storage medium, on which program instructions are stored, which when executed by a processor, implement the image processing model training method in the first aspect described above.

According to the scheme, the correction of the feature similarity is realized by combining the reference relation parameters and the similarity corresponding to the region combination, so that the similarity conditions of different regions in the same image can be compared subsequently, the training of the feature extraction network can be realized by using the contrast learning method, the contrast learning is applied to one image, the application range of the contrast learning training method is expanded, and the extraction accuracy of the feature extraction network on the same image information is improved. In addition, the contrast learning is applied to one image, so that the method is more applicable to the task of image segmentation compared with the contrast learning by using different images, and the method is favorable for improving the image processing accuracy of the image processing model.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and, together with the description, serve to explain the principles of the application.

FIG. 1 is a schematic flow chart diagram illustrating a first embodiment of a training method for a feature extraction network according to the present application;

FIG. 2 is a schematic flow chart diagram illustrating a second embodiment of a training method for a feature extraction network according to the present application;

FIG. 3 is a schematic flow chart of a third embodiment of the training method for the feature extraction network of the present application;

FIG. 4 is a schematic flowchart of a fourth embodiment of the training method for image processing models of the present application;

FIG. 5 is a schematic flow chart illustrating a target feature map obtained in an embodiment of the image processing model training method of the present application;

FIG. 6 is a schematic overall flow chart of the training method of the image processing model of the present application;

FIG. 7 is a schematic flow chart diagram of an embodiment of an image processing method of the present application;

FIG. 8 is a block diagram of an embodiment of a training apparatus for a feature extraction network according to the present application;

FIG. 9 is a block diagram of an embodiment of an apparatus for training an image processing model according to the present application;

FIG. 10 is a schematic diagram of another embodiment of an image processing apparatus according to the present application;

FIG. 11 is a block diagram of an embodiment of an electronic device of the present application;

FIG. 12 is a block diagram of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

The following describes in detail the embodiments of the present application with reference to the drawings attached hereto.

In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, interfaces, techniques, etc. in order to provide a thorough understanding of the present application.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. Further, "plurality" herein means two or more than two. In addition, the term "at least one" herein means any combination of at least two of any one or more of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a training method for a feature extraction network according to a first embodiment of the present disclosure. Specifically, the following steps may be included:

step S11: and performing feature extraction on the sample image by using a feature extraction network to obtain a sample feature map of the sample image.

In this embodiment, the sample image may be a medical image. The sample image may be a two-dimensional image or a three-dimensional image. The three-dimensional image may be a three-dimensional image obtained by scanning an organ. For example, the sample image may be obtained by three-dimensional imaging using Computed Tomography (CT) imaging techniques. The two-dimensional image is, for example, a sample image obtained by an ultrasonic imaging technique or an X-ray imaging technique. It is to be understood that the imaging method of the sample image is not limited.

The feature extraction network may be an overall feature extraction network in the image processing model, or may be a feature extraction network in an intermediate layer of the overall feature extraction network. For example, an image processing model includes an encoder network and a decoder, and in this case, the encoder may be used as the feature extraction network of the present application, or an intermediate feature extraction network in the encoder may be used as the feature extraction network of the application. In one embodiment, when the feature extraction network is an intermediate feature extraction network in the encoder, the feature extraction network may include only the convolutional layer, or may include the convolutional layer, and the pooling layer, the activation layer, and the like after the convolutional layer.

And the characteristic extraction network can obtain a sample characteristic diagram after extracting the characteristics of the sample image. In this embodiment, the input of the feature extraction network may be a sample image, or may be a feature image output by a network on a layer above the feature extraction network, and it can be understood that both cases may be considered as a case where the feature extraction network described in this application performs feature extraction on the sample image. For example, when the feature extraction network is a first-layer network, the input of the feature extraction network may be a sample image, and the feature extraction network may further perform feature extraction on the input sample image to obtain a sample feature map. When the feature extraction network is a second-layer network, the input of the feature extraction network may be a feature image output by a previous-layer network, and the feature extraction network can further perform feature extraction on the input feature image to obtain a sample feature map.

Step S12: and determining the feature similarity of the combination of the groups of regions by using the sample feature map.

In this embodiment, the sample image includes several local regions, and each group of region combinations includes at least two local regions. The local area may be an area in which the feature points in the sample feature map correspond to the sample image, that is, the local area may be determined by the correspondence between the feature points in the sample feature map and the area of the sample image, for example, the local area may be an area in which each feature point in the sample feature map corresponds to the sample image. As another example, the local region may be a region in which every two feature points in the sample feature map correspond to the sample image. The number of the partial areas included in the area combination may be two or three as long as it is not less than two.

In one embodiment, the size of the local region may be determined by the following equation (1).

Wherein, L represents the number of layers of the feature extraction network, L represents a specific certain layer of the feature extraction network, the L-th layer of the feature extraction network is the feature extraction network in the application, and k_lDenotes the size of the convolution kernel of the l-layer network and s denotes the step size.

In this embodiment, the feature similarity of each group of region combinations indicates that the at least two local regions included in the region combination are obtained. Specifically, the feature information of the local region may be determined based on the correspondence between the local region and the feature point on the sample feature map, and the feature similarity of each group of region combinations may be determined according to the feature information of the local region included in the region combinations. The similarity of the compared feature information may be a similarity calculation method commonly used in the art, and will not be described herein.

In one embodiment, each two different local regions in the sample image may be combined into a group of regions, that is, two local regions in the sample image may be combined as a group of regions. Therefore, by combining every two different local regions in the sample image into a group of regions, the feature similarity of two local regions in the sample image can be obtained.

In one embodiment, the feature similarity of each group of region combinations may be determined through steps S121 and S122.

Step S121: and for each group of area combination, acquiring the characteristic information corresponding to each local area in the area combination from the sample characteristic diagram.

Specifically, the feature information of the local region may be determined based on the correspondence between the local region and the feature point on the sample feature map. For example, each local region corresponds to each feature point in the sample feature map, and feature information, such as a feature vector, of each feature point in the sample feature map is feature information of the local region. For another example, every two local regions correspond to each feature point in the sample feature map, and the feature information of every two feature points in the sample feature map is the feature information of the local region.

Step S122: and obtaining the feature similarity of the area combination by using the feature information corresponding to each local area in the area combination.

After determining the feature information corresponding to each local region in the region combination, the similarity of the feature information of each local region in the region combination can be compared according to the similarity calculation method, and the feature similarity of the region combination can be obtained correspondingly.

Step S13: and determining reference relation parameters corresponding to the area combinations of each group based on the labeling information of the area combinations of each group.

In this embodiment, the sample image is labeled with labeling information, so that the labeling information of the local region can be correspondingly determined, and further the labeling information of the region combination can be determined. In one embodiment, the annotation information may be pixel-level annotation information, that is, each pixel in the sample image is annotated with annotation information. In another embodiment, each local region may be labeled with labeling information, that is, one local region may be labeled with labeling information as a whole. The label information is, for example, classification information. For example, the classification information of each pixel point in the sample image, or the classification information of each local area, etc.

In the present embodiment, the reference relationship parameter of the area combination indicates an actual difference situation between the local areas in the area combination. The actual difference condition may be a difference condition of the classification information of each pixel point in the region combination, or a difference condition of the classification information between local regions. In one embodiment, in the case that the annotation information is pixel-level annotation information, the actual difference condition can be obtained by comparing the number of pixel points belonging to each category. When the label information is label information of the entire local region, the actual difference can be obtained by comparing the number of local regions belonging to each classification.

In one embodiment, the reference relation parameter of the area combination is in positive correlation with the actual degree of class similarity between the local areas in the area combination. That is, it can be considered that the larger the reference relationship parameter is, the larger the degree of actual category similarity between the local regions is. When the classification information includes at least three types of classifications, the actual class similarity may be the similarity of a certain class between local regions, or may be the similarity obtained by integrating some or all of the classes between local regions. For example, the classification information includes four classifications, and the actual class similarity degree may be the class similarity degree of one class between the local regions or the class similarity degree of all classes. The method for determining the degree of similarity of the categories may refer to a method for calculating the actual difference condition, and details are not repeated here. Therefore, the reference relation parameter of the area combination is set to be in positive correlation with the actual class similarity between the local areas in the area combination, so that the actual class similarity between the local areas in the area combination can be intuitively reflected through the reference relation parameter.

Step S14: and adjusting the network parameters of the feature extraction network by using the feature similarity and the reference relation parameters of each group of regional combinations.

In the application, the feature similarity of each group of regional combinations can be regarded as the feature similarity on a feature information level, and the feature similarity of each group of regional combinations is processed by using the reference relation parameters, so that the feature similarity can be corrected based on the difference of the labeled information among the local regions of the regional combinations, and further more accurate similarity of the feature information of the local regions of the regional combinations can be obtained. Subsequently, the network parameters of the feature extraction network can be adjusted based on a comparison learning method, and training of the feature extraction network is further achieved.

Therefore, by combining the reference relation parameters and the similarity corresponding to the region combination, the feature similarity is corrected, so that the similarity conditions of different regions in the same image can be compared subsequently, the feature extraction network can be trained by using the contrast learning method, the contrast learning is applied to one image, the application range of the contrast learning training method is expanded, and the extraction accuracy of the feature extraction network on the same image information is improved. In addition, since the contrast learning is applied to one image, the method is more applicable to the task of image segmentation compared with the contrast learning performed by using different images, and the method is favorable for improving the image processing accuracy of the image processing model.

In an embodiment, the feature extraction network is a part of the image processing model, and the feature extraction network may be an entire encoder or a part of the encoder. The image processing model is used for predicting the sample image based on the sample feature map of the sample image, for example, classifying pixel points of the sample image, so as to realize segmentation of the sample image. In addition, the training method of the feature extraction network is executed in the pre-training stage of the image processing model. Therefore, the training method of the feature extraction network is applied to the pre-training stage of the image processing model, and is beneficial to improving the pre-training effect, so that the training effect of the subsequent image processing model is improved.

Referring to fig. 2, fig. 2 is a flowchart illustrating a training method for a feature extraction network according to a second embodiment of the present disclosure. In this embodiment, the labeling information of the area combination includes a preset classification to which the pixel point in each local area of the area combination belongs. The preset classification may be a predetermined classification. Such as dogs, cats, people, and background, etc.

In this embodiment, the step "determining the reference relationship parameter of each group of area combinations based on the label information of each group of area combinations" mentioned in the above embodiments specifically includes steps S21 to S23.

Step S21: the respective area combinations are set as target area combinations.

When the reference relationship parameters of each group of area combinations are obtained, each area combination can be used as a target area combination, so that each target area combination can be processed subsequently.

Step S22: and respectively determining the category parameters of each target local area about each preset classification based on the labeling information of the target area combination.

In the present embodiment, the target local region is a local region in the target region combination. For example, if a region combination includes two local regions, the target region combination may include two target local regions after the region combination is used as the target region combination.

The labeling information of this embodiment may be classification information, specifically, classification information of each pixel point of the target local region, or classification information of the target local region as a whole. The category parameter related to the preset classification characterizes the condition of the pixel points belonging to the preset classification in the target local region, that is, the category parameter can reflect the condition that each pixel point in the target local region belongs to the preset classification.

In an embodiment, when the annotation information is classification information of each pixel point of the target local region, the classification parameter of the preset classification may be further determined according to the number of the pixel points belonging to the preset classification in the target local region and then according to the number of the pixel points belonging to the preset classification. For example, if the preset classification includes dog, cat, person and background, the category parameter of each target local region with respect to the dog, the category parameter of the cat, the category parameter of the person and the category parameter of the background may be determined according to the number of pixel points in the target local region that belong to the dog, the cat, the person and the background, respectively. In one embodiment, when the annotation information is classification information of the entire target local region, the category parameter of the preset classification may be further determined according to the specific classification of the target local region. For example, if the preset classification includes a dog, a cat, and a background, the category parameters of the target local area belonging to each dog, cat, and background can be determined according to the specific classification of the target local area.

In one embodiment, the step of "respectively determining the category parameter of each target local area with respect to each preset classification based on the labeling information of the target area combination" specifically includes step S221 and step S222 (not shown).

Step S221: and taking each preset classification as a target classification.

When the category parameter of each preset category is obtained, each preset category can be used as a target category, so that each target category can be processed subsequently and respectively.

Step S222: for each target local area, counting the number of pixel points belonging to the target classification in the target local area, and determining the category parameter of the target local area belonging to the target classification based on the number of the pixel points belonging to the target classification.

In this embodiment, the label information is classification information of each pixel point of the target local region. At this time, for each target local region, the number of pixels belonging to the target classification in the target local region is counted, for example, the preset classification includes dog, cat, person and background, the size of the target local region is 10 × 10, and there are 100 pixels in total, the number of pixels belonging to the dog in the target local region is 25, the number of pixels belonging to the cat is 30, the number of pixels belonging to the person is 35, and the number of pixels belonging to the background is 10. Further, the category parameter of the target local area belonging to the target classification can be further determined based on the number of the pixel points belonging to the target classification. For example, the category parameter of a certain target classification may be determined according to the number of pixels of the certain target classification and the number of pixels of other target classifications.

In a specific embodiment, a ratio between the number of pixels belonging to the target classification and the total number of pixels in the target local region may be used as a classification parameter that the target local region belongs to the target classification. Specifically, the category parameter of the target classification can be obtained by the following formula (2).

In the present embodiment of the invention, it is preferred,

class parameter, R, representing the classification of an object as m_iThe total number of pixel points representing the target local area,

and representing the number of pixel points belonging to the category m in the target local area. For example, m is classified as cat, the cat's category parameter may be determined to be 0.3.

In other specific embodiments, the ratio between the number of pixels belonging to a certain target category and the total number of pixels belonging to other target categories may be used.

Therefore, the number of the pixel points belonging to the target classification in the target local area is counted, so that the category parameter of the target local area belonging to the target classification can be determined subsequently based on the number of the pixel points belonging to the target classification, and the category parameter can be determined by utilizing the labeling information.

Step S23: and obtaining a reference relation parameter of the target area combination based on the category parameter.

In this embodiment, since the category parameter is obtained by using the label information of the target area combination, specifically, by using the label information of each target local area combination, it can be considered that the category parameter can reflect the classification condition of each target local area combination. Therefore, based on the category parameters, the classification difference of each target classification in the target area combination can be obtained, and further, the reference relation parameters of the target area combination can be obtained. Specifically, the reference relationship parameter of the target region combination may be obtained by comparing the class parameters of each target classification in different target local regions in the same target region combination.

In one embodiment, the aforementioned step of "obtaining the reference relationship parameter of the target area combination by using the category parameter" specifically includes step S231 and step S232 (not shown).

Step S231: and for each preset classification, obtaining the class parameter difference of the target area combination relative to the preset classification based on the class parameters of each target local area belonging to the preset classification.

For each preset classification, because the class parameter of each target area combination can reflect the classification condition of the class, the class parameter difference of the target area combination with respect to the preset classification can be obtained based on the class parameter of each target local area belonging to the preset classification. The category parameter differences may be considered to be a manifestation of different aspects of the target area combination with respect to the preset classification.

In one embodiment, the difference between the category parameters belonging to the preset classification between the target local regions may be used as the category parameter difference of the target region combination with respect to the preset classification. Specifically, the category parameter of the target classification can be obtained by the following formula (3).

Wherein the content of the first and second substances,

a category parameter indicating that the target local area i belongs to a preset category m,

class parameter, Φ, representing the target local region j belonging to a preset class m_mRepresenting the category parameter difference of the target area combination with respect to the preset classification m. Therefore, by taking the difference between the category parameters belonging to the preset classification between the target local regions as the category parameter difference of the target region combination with respect to the preset classification, the determination of the category parameter difference of the target region combination with respect to the preset classification using the category parameters is achieved.

In other specific embodiments, the ratio of the category parameters belonging to the preset classification between the target local regions may be used as the category parameter, or the category parameter difference may be determined by other calculation methods, which is not limited herein.

Step S232: and obtaining a reference relation parameter of the target area combination based on the category parameter difference of the target area combination about each preset classification.

After the category parameter difference corresponding to each preset category is obtained, it means that the difference situation of the target area combination in each preset category has been determined, and at this time, fusion processing may be performed based on the category parameter difference of the target area combination with respect to each preset category, so as to obtain the reference relationship parameter of the target area combination. The fusion process is, for example, summation, or weighted summation, or averaging, etc., and the calculation method is not limited.

In an embodiment, the step of "obtaining the reference relationship parameter of the target region combination based on the difference of the category parameters of the target region combination with respect to each preset category" specifically includes step S2321 and step S2322 (not shown).

Step S2321: and acquiring a statistical value of the category parameter difference of the target area combination about each preset classification.

In this embodiment, the statistical value may be a general value such as a mean value, a median, a mode, and the like. In one example, the statistical value may be an average value.

Therefore, by acquiring the statistical value, it is possible to obtain the numerical difference of the target area combination with respect to the difference of the category parameter for each preset classification.

Step S2322: and obtaining the reference relation parameters of the target area combination by using the statistical values of the target area combination.

In the present embodiment, the reference relation parameter of the target area combination is negatively correlated with the statistical value of the target area combination. By setting the reference relation parameter to be negatively correlated with the statistical value of the target area combination, it is possible to make the reference relation parameter represent the manifestation of the same aspect of the target area combination with respect to the preset classification.

In one embodiment, the reference system parameter of the target region combination may be calculated by the following formula (4).

Wherein M represents all preset categories, M represents a specific preset category,

representing the difference in class parameter of the target area combination with respect to a preset classification m, w_ijReference relation parameters representing combinations of target areas. In this embodiment, w may be_ijThe similarity weights of different target local regions in the target region combination are considered, and are used for reflecting the same aspect of all target local regions in the target region combination with respect to all preset classifications.

Therefore, by acquiring the statistical value of the target area combination about the category parameter difference of each preset classification and obtaining the reference relation parameter of the target area combination according to the statistical value of the target area combination, the reference relation parameter for the actual difference situation between the local areas in the area combination can be obtained.

Referring to fig. 3, fig. 3 is a schematic flow chart of a training method for a feature extraction network according to a third embodiment of the present application. In this embodiment, the step "adjusting the network parameters of the feature extraction network by using the feature similarity and the reference relationship parameter of each group of the area combination" mentioned in the above embodiments specifically includes steps S31 to S33.

Step S31: respectively combining all the groups of area combinations into a target area combination, adjusting the feature similarity of the target area combination by using the reference relation parameters of the target area combination to obtain the reference feature similarity of the target area combination, and obtaining the first loss of the target area combination based on the reference feature similarity of the target area combination;

in one embodiment, the target region combination includes two target local regions, and the feature similarity of the target region combination can be calculated by the following formula (5).

Wherein v is_iCharacteristic information v of a target local area i representing a combination of target areas_jFeature information of the target local area j is indicated, and sim represents cosine similarity (cosine similarity).

In other embodiments, the feature similarity of the target region combination may be determined by other feature similarity calculation methods.

In this embodiment, the feature similarity of the target area combination is adjusted by using the reference relationship parameter of the target area combination, the reference relationship parameter may be used as a reference object, and after a certain processing is performed on the reference relationship parameter, the reference relationship parameter is multiplied by the feature similarity, so as to obtain the reference feature similarity of the target area combination.

In one embodiment, the product of the reference relationship parameter of the target region combination and the feature similarity of the target region combination may be used as the reference feature similarity of the target region combination. Therefore, the product of the reference relation parameter of the target area combination and the feature similarity of the target area combination is used as the reference feature similarity of the target area combination, so that the feature similarity is processed by using the reference relation parameter, and the obtained reference feature similarity can more accurately reflect the feature similarity between target local areas in the target area combination.

When the similarity of the reference features of the target region combination is obtained, which means that the similarity of the features between the target local regions in the target region combination has been determined, the first loss of the target region combination may be further determined by using a contrast learning method, so as to obtain the first loss of the target region combination in terms of similarity.

In one embodiment, the step of obtaining the first loss of the target region combination based on the similarity of the reference features of the target region combination specifically includes: step S311 and step S312.

Step S311: and correspondingly obtaining the auxiliary feature similarity of each auxiliary area combination by using the feature similarity and the auxiliary relation parameter of at least one auxiliary area combination.

In the present embodiment, at least one same local region exists in the auxiliary region combination and the target region combination. The auxiliary area combination is also composed of at least two local areas, which may contain the same number of local areas as the target area combination. In one embodiment, the target area combination and the auxiliary area combination each comprise two local areas. In an embodiment, one local area in the target area combination is a reference area, and the auxiliary area combination is an area combination including the reference area.

In the present embodiment, the auxiliary relationship parameters of the auxiliary region combinations are obtained based on the reference relationship parameters of the auxiliary region combinations. For a specific calculation method of the reference relationship parameters of the auxiliary area combination, reference may be made to the calculation method of the reference relationship parameters of the target area combination mentioned in the foregoing embodiment, and details are not repeated here in a specific embodiment, where a sum of the auxiliary relationship parameters of the auxiliary area combination and the reference relationship parameters of the auxiliary area combination is a preset value, and the preset value is, for example, 1. Therefore, by setting the sum of the auxiliary relationship parameter of the auxiliary region combination and the reference relationship parameter of the auxiliary region combination to be a preset value, the auxiliary relationship parameter of the auxiliary relationship parameter and the reference relationship parameter are in a negative correlation relationship, so that the auxiliary relationship parameter and the reference relationship parameter of the auxiliary region combination can be used for representing the same aspect and the different aspect of each target local region in the target region combination with respect to all preset classifications. When the reference relationship parameter is used for embodying the same aspect of each target local area in the target area combination with respect to all preset classifications, the auxiliary relationship parameter can represent different aspects of each target local area in the target area combination with respect to all preset classifications.

After the auxiliary relationship parameters of the auxiliary area combination are obtained, the auxiliary feature similarity of the auxiliary area combination can be correspondingly obtained. In one example, the feature similarity of the auxiliary region combination may be calculated by using a feature similarity calculation method, and a product of the feature similarity of the auxiliary region combination and the auxiliary relationship parameter may be used as the auxiliary feature similarity.

Step S312: and obtaining a first loss corresponding to the target area combination based on the reference feature similarity and the auxiliary feature similarity.

The auxiliary relation parameters and the reference relation parameters of the auxiliary area combination can be used for representing the same aspects and different aspects of all target local areas in the target area combination about all preset classifications, and the auxiliary feature similarity and the reference feature similarity obtained based on the auxiliary relation parameters and the reference relation parameters can correspondingly represent the same aspects and different aspects of the target area combination and the auxiliary area combination. Therefore, the first loss corresponding to the target area combination can be obtained based on the reference feature similarity and the auxiliary feature similarity based on the comparison learning method. The first loss can be considered as a loss in the feature similarity when the feature extraction network performs feature extraction on the same image information.

In a specific embodiment, the step of "obtaining the first loss corresponding to the target area combination based on the reference feature similarity and the assistant feature similarity" may specifically include steps S3121 to S3123 (not shown).

Step S3121: and performing preset operation on the reference feature similarity to obtain a first operation result.

Step S3122: and respectively carrying out preset operation on the auxiliary feature similarity of each auxiliary area combination to obtain a second operation result corresponding to each auxiliary area combination.

Step S3123: and obtaining a first loss corresponding to the target area combination based on the ratio of the first operation result to the sum of the second operation results corresponding to the auxiliary area combinations.

The first predetermined operation is, for example, an exponential operation with a natural constant e as a base, or any other operation, which is not limited herein.

In the case that the auxiliary area combination includes a plurality of area combinations, the sum of the second operation results corresponding to each auxiliary area combination may be further calculated to finally obtain the second operation results corresponding to all the auxiliary area combinations.

In one embodiment, the first loss L can be calculated by the following equation (6)₁。

Wherein s is_ijRepresenting combinations of target areas, s_ikIndicating a combination of auxiliary areas, w_ijRepresents the reference relationship parameter, 1-_ikAnd (3) representing auxiliary relation parameters, N representing all local areas, tau being an adjustable adjusting parameter, exp being a preset operation, and an exponential function with a natural constant e as a base.

In terms of equation (6), the same local region in the target region combination and the auxiliary region combination is a local regionAnd (4) a field i. At this time, the auxiliary region combination is a region combination composed of the local region i and each of the local regions (including itself) in the entire local regions. When the reference relation parameter is used for representing the same aspect of each target local area in the target area combination with respect to all preset classifications, the target area combination s can be obtained by calculation_ijThe first loss of the feature similarity and the feature dissimilarity of the auxiliary area combination realizes the comparison process in the comparison learning.

Therefore, the feature similarity and the assistant relationship parameter of at least one assistant region combination are used to correspondingly obtain the assistant feature similarity of each assistant region combination, and further, the first loss corresponding to the target region combination can be obtained based on the reference feature similarity and the assistant feature similarity, so that the loss of the feature extraction network in the aspect of feature similarity when the same image information is subjected to feature extraction is obtained.

Step S32: obtaining a second loss of the feature extraction network based on the first loss of each group of regional combinations;

after each group of area combination is taken as the target area combination, the first loss corresponding to each group of area combination can be obtained. In this case, the first loss of each group of region combinations may be further used to calculate the loss of the similarity of the entire feature extraction network in extracting features from different local regions, so as to obtain the loss of the similarity of the entire sample feature map, and to apply the contrast learning to one image.

In one embodiment, the second loss L may be calculated using the following equation (7)₂。

The meaning of each parameter of formula (7) can be referred to the related description of formula (6) above, and is not described herein again.

In the formula (7), it can be considered that two of the local regions included in the sample image are combined to obtain the auxiliary region combination, and in this case, the local regions may be combinedTo consider the target area combination s_ijWill be compared with the entirety of the inter-composed auxiliary area combinations of all local areas, i.e. using the target area combination s when the reference relation parameter is used to represent the same aspect of each target local area in the target area combination with respect to all preset classifications_ijAre compared with different feature aspects of all the auxiliary area combinations, and therefore comparison learning is performed to improve the extraction accuracy of the same image information by the feature extraction network.

Step S33: and adjusting the network parameters of the feature extraction network by using the second loss.

After the second loss is obtained, training of the feature extraction network can be realized by adjusting network parameters of the feature extraction network and based on a comparison learning method.

Therefore, the first loss of the target area combination is obtained by using the reference feature similarity of the target area combination, and then the second loss of the feature extraction network in terms of the overall similarity of the sample feature map can be obtained based on the first loss of each group of area combination, so that the network parameters of the feature extraction network can be adjusted by using the second loss, a method of applying contrast learning on one image is realized, and the training of the feature extraction network is completed.

Referring to fig. 4, fig. 4 is a flowchart illustrating a fourth embodiment of the training method for image processing models according to the present application. The training method of the image processing model of the present embodiment specifically includes steps S41 to S43.

Step S41: the training method described in the training embodiment of the image processing model is performed to pre-train the feature extraction network of the image processing model.

For a specific pre-training process, please refer to the related description of the above embodiment of the training method for the feature extraction network, which is not described herein again.

Step S42: and carrying out image processing on the sample image by using the image processing model to obtain a detection result of the sample image.

In this embodiment, the image processing includes performing feature extraction on the sample image by using a feature extraction network to obtain an original feature map. Then, feature optimization can be performed on the corresponding feature areas in the original feature map by using the uncertainty parameters corresponding to the feature areas in the original feature map, so as to obtain a target feature map. And obtaining the detection result of the sample image based on the target characteristic diagram.

In one embodiment, the sample image may be a medical image, such as a two-dimensional image, a three-dimensional image, or the like obtained by a medical imaging method. Medical Imaging methods include Computed Tomography (CT), Magnetic Resonance Imaging (MRI), or Ultrasound (US) Imaging methods. The target image is, for example, a medical image including a human organ, a bone, and the like.

In one embodiment, the original feature map may be a feature map output by an encoding layer of the image processing model, or may be a feature map output by a decoding layer of the image processing model, and specifically may be a feature map output by the encoding layer or any intermediate network layer of the decoding layer.

In one embodiment, in the target feature map, the influence of the target feature information corresponding to the feature region is related to the uncertainty parameter corresponding to the feature region. The uncertainty parameter corresponding to the feature region may indicate semantic uncertainty of the feature information of the feature region, and may also indicate a degree of probability that a detection result of the sample image obtained based on the feature information of the feature region is inaccurate. The magnitude of the influence of the target feature information corresponding to the feature region may indicate the magnitude of the acting force of the target feature information corresponding to the feature region on the detection result of the obtained sample image, that is, the stronger the target feature information corresponding to the feature region is, the greater the acting force thereof on the detection result of the obtained sample image is. Conversely, the weaker the target characteristic information corresponding to the characteristic region, the smaller the acting force of the target characteristic information on obtaining the detection result of the sample image. Therefore, by acquiring the uncertainty parameter, the uncertainty parameter can be used to distinguish the magnitude of influence on the feature information of the feature region, so as to contribute to improving the accuracy of the detection result of the sample image.

In one embodiment, the feature region may be a feature point of the original feature map, or a region composed of several feature points. In one embodiment, each feature point in the original feature map may be used as a feature region, so that the uncertainty parameter of each feature point may be determined, and the most comprehensive uncertainty parameter may be obtained, thereby contributing to the accuracy of the detection result of the sample image.

In one embodiment, the uncertainty parameter corresponding to each feature region may be determined based on the initial feature information in the original feature map. In this embodiment, the initial feature information in the original feature map may reflect the image information of the sample image, and therefore, the processing may be performed based on the initial feature information in the original feature map, specifically, based on uncertainty processing using the initial feature information in the original feature map, for example, uncertainty estimation using the initial feature information in the original feature map, so as to obtain the uncertainty parameter. Therefore, the uncertainty parameters corresponding to the characteristic regions are obtained based on the initial characteristic information in the original characteristic diagram, so that uncertainty estimation of the initial characteristic information is realized, and the accuracy of the detection result of the sample image is improved.

In one embodiment, the obtained target feature map may be used to replace the original feature map, and then image processing may be performed based on the target feature map, so as to obtain a detection result of the sample image. For example, after a certain intermediate network layer performs feature extraction on a sample image, an original feature map is output, and after a target feature map is obtained, the target feature map may be used as the output of the intermediate network layer and input into a next network layer to continue image processing, and finally, a detection result of the sample image is obtained.

Step S43: and adjusting the network parameters of the image processing model based on the detection result.

After the detection result of the sample image is obtained, the detection result of the sample image can be compared with the labeling information of the sample image, and then the loss value can be determined. So that the network parameters of the image processing model can be correspondingly adjusted according to the loss values.

In one embodiment, the detection result is a result corresponding to the labeling information of the sample image. For example, the annotation information is a classification result of a pixel point in the sample image, and the detection result may also be a classification result of a pixel point in the sample image. In another example, if the annotation information is a detection result of a plurality of targets in the sample image, the detection result may also be a detection result of a plurality of targets in the sample image. Therefore, the processing effect of the image processing model can be determined by comparing the difference between the detection result and the labeling information of the sample image and calculating the loss value of the detection result and the labeling information of the sample image. The calculation method of the loss value may be a calculation method commonly used in the art, and will not be described herein.

After determining the loss value, the network parameters of the image processing model may be adjusted according to the loss value, and the specific adjustment method may be a general method in the art, and will not be described herein again.

In an embodiment, the above-mentioned "determining the uncertainty parameter corresponding to each feature region based on the initial feature information in the original feature map" specifically includes steps S51 and S52.

Step S51: and transforming the initial characteristic information in the original characteristic diagram to obtain the characteristic confidence corresponding to each characteristic region.

The transformation processing of the initial feature information in the original feature map may be a general processing method in deep learning, such as convolution processing, activation, batch normalization, and normalization processing of the initial feature information in the original feature map.

Feature confidence may be considered a confidence representation of the detection results of the sample image. In particular, the feature confidence may be a direct representation of the detection result of the sample image, i.e. the feature confidence may directly represent the detection result confidence of the sample image, e.g. the confidence of the classification. The feature confidence may also be an indirect representation of the detection result of the sample image, that is, the feature confidence may be further used to obtain the detection result confidence of the sample image, and at this time, the feature confidence may be considered as an intermediate result.

Step S52: and obtaining the uncertainty parameters corresponding to the characteristic areas based on the characteristic confidence degrees corresponding to the characteristic areas.

On the basis of the feature confidence, the uncertainty of the initial feature information corresponding to the feature region is obtained preliminarily, at this time, the uncertainty parameter corresponding to each feature region can be calculated and obtained further on the basis of the feature confidence corresponding to the feature region, and the specific calculation method can be the same uncertainty quantification method in the deep learning field.

In one embodiment, the feature confidence corresponding to the feature region includes a class confidence of a plurality of channels, and the class confidence of each channel indicates a confidence that the feature region belongs to a corresponding class. That is, the number of channels for feature confidence is the same as the number of classifications for the target detection result. In this case, the feature confidence may be directly expressed by the detection result of the sample image. For example, the classification number of the target detection result is 3, including normal tissue, tumor tissue and background, and the number of channels of the feature confidence is also 3 channels, and the value of each channel represents the class confidence of the normal tissue, the tumor tissue and the background, respectively. Therefore, the confidence degree that the feature region belongs to one corresponding class is represented by setting the class confidence degree of each channel of the feature confidence degree corresponding to the feature region, so that the classification condition can be visually represented by the feature confidence degree. In this case, the step of "obtaining the uncertainty parameter corresponding to each feature region based on the feature confidence corresponding to each feature region" specifically includes: and for each characteristic region, performing information entropy processing based on the category confidence of a plurality of channels corresponding to the characteristic region to obtain an uncertainty parameter corresponding to the characteristic region. The specific calculation of the information entropy may be a general calculation method. Therefore, the information entropy processing is performed by using the confidence coefficient of the feature region category, so that the uncertainty parameter corresponding to the feature region can be obtained.

In one embodiment, the uncertainty parameter u corresponding to the feature region may be calculated by the following equation (8)_ij。

Wherein M represents the number of channels for the confidence of the feature, M represents a specific channel for the confidence of the feature,

and representing the feature confidence of the feature points of the ith row and the jth column in the original feature map.

Therefore, by obtaining the feature confidence corresponding to the feature region, the uncertainty parameter corresponding to the feature region can be obtained based on the feature confidence corresponding to the feature region.

In an embodiment, the step of "performing feature optimization on the corresponding feature region in the original feature map by using the uncertainty parameter corresponding to each feature region in the original feature map to obtain the target feature map" specifically includes steps S61 and S62.

Step S61: and obtaining the certainty parameters corresponding to the characteristic areas based on the uncertainty parameters corresponding to the characteristic areas.

It can be understood that the larger the uncertainty parameter corresponding to a feature region, the smaller the corresponding uncertainty parameter. Therefore, the certainty parameter corresponding to the feature region can be obtained based on the uncertainty parameter corresponding to each feature region based on the negative correlation relationship between the uncertainty parameter and the certainty parameter.

In one embodiment, for each feature region, a difference between the first value and the uncertainty parameter corresponding to the feature region is used as the certainty parameter corresponding to the feature region. The first value is, for example, 1 or another value, and is not limited herein.

In one embodiment, the certainty parameter can be calculated by the following equation (9)

Wherein U represents an uncertainty parameter corresponding to the characteristic region,

and representing the corresponding deterministic parameters of the characteristic region.

Therefore, the determination of the certainty parameter of the feature region is achieved by taking the difference between the first value and the uncertainty parameter corresponding to the feature region as the certainty parameter corresponding to the feature region.

Step S62: and correspondingly adjusting the original characteristic information of each characteristic area in the original characteristic diagram by using the corresponding deterministic parameters of each characteristic area to obtain the target characteristic information of each characteristic area in the target characteristic diagram.

After the certainty parameters corresponding to the feature areas are obtained, the certainty parameters corresponding to the feature areas can be used for correspondingly adjusting the original feature information of each feature area in the original feature map. In one embodiment, the original feature information of each feature area in the original feature map may be weighted by using the certainty parameter, so as to obtain the target feature information of each feature area in the target feature map. In other embodiments, the processing may be performed by other calculation methods, which is not limited herein.

Therefore, by obtaining the certainty parameter corresponding to each feature region based on the uncertainty parameter corresponding to the feature region, the original feature information of the feature region can be adjusted by using the certainty parameter corresponding to the feature region, so as to obtain the target feature information of the feature region.

In one embodiment, the step of "obtaining the target feature information of each feature area in the target feature map by correspondingly adjusting the original feature information of each feature area in the original feature map by using the certainty parameter corresponding to each feature area" specifically includes steps S621 and S622.

Step S621: and for each characteristic region, acquiring the sum of the certainty parameter corresponding to the characteristic region and the second value as the adjustment weight of the characteristic region.

The second value is, for example, 1, or other specific values, and is not limited herein.

Step S622: and weighting the original characteristic information of the characteristic region by using the adjustment weight of the characteristic region to obtain the target characteristic information of the characteristic region.

In one embodiment, the target feature information of the feature region may be obtained by the following formula (10)

Wherein H represents the original feature information of the feature region, 1 is a second numerical value,

target feature information representing the feature area,

indicating the corresponding certainty parameter of the characteristic region.

Therefore, the sum of the certainty parameter corresponding to the feature region and the second numerical value is used as the adjustment weight of the feature region, so that the original feature information of the feature region can be weighted by using the adjustment weight of the feature region, and the target feature information of the feature region can be obtained.

In one embodiment, the detection result of the sample image may be to classify each pixel point of the sample image, so as to implement segmentation of the sample image, for example, segmentation of a medical image. In another embodiment, the detection result of the sample image may also be a target detection result of the sample image, or a task type of other image processing fields, which is not limited herein.

In one embodiment, when the image processing method is performed by using an image processing model, an uncertainty processing model module may be provided in the image processing model, and used for performing the relevant steps mentioned in the above embodiments after the original feature map is processed to obtain the target feature map.

Referring to fig. 5, fig. 5 is a schematic flowchart illustrating a process for obtaining a target feature map according to an embodiment of the present application. In fig. 5, the image 101 is an original feature map H, which has a size H × w and the number of channels c. And f (x) is used for processing the original feature map to obtain a feature confidence corresponding to each feature region, wherein n is the number of channels of the feature confidence. Encopy (x) represents that the shannon entropy is calculated by using the characteristic confidence degree corresponding to each characteristic region, so as to obtain an uncertainty parameter U, and finally, a target characteristic diagram can be obtained by using the uncertainty parameter U and the original characteristic diagram 101

Referring to fig. 6, fig. 6 is a schematic diagram of an overall training flow of the training method of the image processing model of the present application. In fig. 6, the sample image 21 is input into the image processing model 22, and then the detection result 23 of the sample image 21 is output, so that the loss value can be obtained according to the detection result 23 of the sample image 21 and the label information 24 of the sample image 21, thereby implementing training of the image processing model. In fig. 8, the image processing model 22 includes an encoder 221 and a decoder 222, the encoder 221 including a 3-layer intermediate network layer 2211, and the decoder including a three-layer intermediate network layer 2222. For any layer of the intermediate network layer 2211 or the intermediate network layer 2222, the training method mentioned in the above-mentioned training method embodiment of the image processing model may be used for training.

Referring to fig. 7, fig. 7 is a flowchart illustrating an embodiment of an image processing method according to the present application. In the present embodiment, the image processing method includes step S71 and step S72.

Step S71: and acquiring a target image.

The target image may be the same as the sample image described above and will not be described herein.

Step S71: and processing the target image by using the image processing model to obtain a detection result of the target image.

In this embodiment the image processing model is trained using the method of claim 11. For a specific step of obtaining a detection result of the target image to obtain a correlation process, please refer to the related description of the embodiment of the training method for the image processing model, which is not described herein again.

Therefore, the image processing model trained by the training method of the image processing model is used for detecting the target image to obtain the detection result, which is beneficial to improving the accuracy of the detection result.

Referring to fig. 8, fig. 8 is a schematic diagram of a framework of an embodiment of a training apparatus for a feature extraction network according to the present application. The training apparatus 30 of the feature extraction network includes an obtaining module 31, a first determining module 32, a second determining module 33, and an adjusting module 34. The obtaining module 31 is configured to perform feature extraction on the sample image by using a feature extraction network to obtain a sample feature map of the sample image; the first determining module 32 is configured to determine feature similarity of a plurality of groups of region combinations by using a sample feature map, where the sample image includes a plurality of local regions, and the feature similarity of each group of region combinations indicates that the feature similarity of at least two local regions included in the region combination is obtained; the second determining module 33 is configured to determine, based on the label information of each group of area combinations, reference relationship parameters corresponding to each group of area combinations, where each reference relationship parameter represents an actual difference between local areas in the area combination; the adjusting module 34 is configured to adjust network parameters of the feature extraction network by using the feature similarity and the reference relationship parameter of each group of regional combinations.

The marking information of the area combination comprises preset classifications to which pixel points in each local area of the area combination belong; the second determining module 33 is configured to determine, based on the labeled information of each group of area combinations, a reference relationship parameter corresponding to each group of area combinations, and includes: respectively taking each area combination as a target area combination; respectively determining category parameters of each target local area about each preset classification based on the labeling information of the target area combination, wherein the target local area is a local area in the target area combination; and obtaining a reference relation parameter of the target area combination based on the category parameter.

The second determining module 33 is configured to determine category parameters of each target local area with respect to each preset category based on the labeling information of the target area combination, and includes: respectively taking each preset classification as a target classification; for each target local area, counting the number of pixel points belonging to the target classification in the target local area, and determining the category parameter of the target local area belonging to the target classification based on the number of the pixel points belonging to the target classification; and/or, the second determining module 33 is configured to obtain the reference relationship parameter of the target area combination by using the category parameter, and includes: for each preset classification, obtaining the classification parameter difference of the target area combination relative to the preset classification based on the classification parameter of each target local area belonging to the preset classification; and obtaining a reference relation parameter of the target area combination based on the category parameter difference of the target area combination about each preset category.

The second determining module 33 is configured to determine the category parameter of the target local area belonging to the target classification based on the number of the pixel points belonging to the target classification, and includes: and taking the ratio of the number of the pixel points belonging to the target classification to the total number of the pixel points of the target local area as a category parameter of the target local area belonging to the target classification.

The second determining module 33 is configured to obtain a category parameter difference of the target area combination with respect to a preset category based on the category parameter of each target local area belonging to the preset category, and includes: taking the difference of the category parameters belonging to the preset classification between the target local areas as the category parameter difference of the target area combination relative to the preset classification; obtaining a reference relation parameter of the target area combination based on the category parameter difference of the target area combination about each preset classification, wherein the reference relation parameter comprises the following steps: acquiring a statistical value of the category parameter difference of the target area combination about each preset classification; and obtaining the reference relation parameter of the target area combination by using the statistical value of the target area combination, wherein the reference relation parameter of the target area combination is in negative correlation with the statistical value of the target area combination.

The adjusting module 34 is configured to adjust the network parameters of the feature extraction network by using the feature similarity and the reference relationship parameter of each group of regional combinations, and includes: respectively combining all the groups of area combinations into a target area combination, adjusting the feature similarity of the target area combination by using the reference relation parameters of the target area combination to obtain the reference feature similarity of the target area combination, and obtaining the first loss of the target area combination based on the reference feature similarity of the target area combination; obtaining a second loss of the feature extraction network based on the first loss of each group of regional combinations; and adjusting the network parameters of the feature extraction network by using the second loss.

The adjusting module 34 is configured to adjust the feature similarity of the target area combination by using the reference relationship parameter of the target area combination to obtain the reference feature similarity of the target area combination, and includes: taking the product of the reference relation parameter of the target area combination and the feature similarity of the target area combination as the reference feature similarity of the target area combination;

the adjusting module 34 is configured to obtain a first loss of the target area combination based on the reference feature similarity of the target area combination, and includes: the method comprises the steps that the feature similarity and the auxiliary relation parameters of at least one auxiliary area combination are utilized to correspondingly obtain the auxiliary feature similarity of each auxiliary area combination, wherein the auxiliary area combination and a target area combination have at least one same local area, and the auxiliary relation parameters of the auxiliary area combination are obtained based on the reference relation parameters of the auxiliary area combination; and obtaining a first loss corresponding to the target area combination based on the reference feature similarity and the auxiliary feature similarity.

Wherein, one local area in the target area combination is a reference area, and the auxiliary area combination is an area combination containing the reference area; and/or the sum of the auxiliary relation parameters of the auxiliary area combination and the reference relation parameters of the auxiliary area combination is a preset value; the adjusting module 34 is configured to obtain a first loss corresponding to the target area combination based on the reference feature similarity and the auxiliary feature similarity, and includes: performing preset operation on the reference feature similarity to obtain a first operation result; respectively carrying out preset operation on the auxiliary feature similarity of each auxiliary area combination to obtain a second operation result corresponding to each auxiliary area combination; and obtaining a first loss corresponding to the target area combination based on the ratio of the first operation result to the sum of the second operation results corresponding to the auxiliary area combinations.

The feature extraction network is a part of an image processing model, and the image processing model is used for predicting a sample image based on a sample feature map of the sample image; the training method of the feature extraction network is executed in a pre-training stage of the image processing model.

Wherein, the reference relation parameter of the area combination and the actual class similarity between the local areas in the area combination form a positive correlation; and/or, every two different local areas in the sample image form a group of area combinations; the first determining module 32 is configured to determine feature similarity of a plurality of groups of region combinations by using the sample feature map, and includes: for each group of area combination, acquiring the characteristic information corresponding to each local area in the area combination from the sample characteristic diagram; and obtaining the feature similarity of the area combination by using the feature information corresponding to each local area in the area combination.

Referring to fig. 9, fig. 9 is a schematic diagram of a framework of an embodiment of an image processing model training apparatus according to the present application. The training device 40 of the image processing model comprises a pre-training module 41, a detection module 42 and an adjustment module 43, wherein the pre-training module 41 is configured to perform the method described in the above embodiment of the image processing method, and pre-train a feature extraction network of the image processing model; the detection module 42 is configured to perform image processing on the sample image by using the image processing model to obtain a detection result of the sample image, where the image processing includes performing feature extraction on the sample image by using a feature extraction network to obtain an original feature map; performing feature optimization on corresponding feature areas in the original feature map by using the uncertainty parameters corresponding to the feature areas in the original feature map to obtain a target feature map, and obtaining a detection result of the sample image based on the target feature map; the adjusting module 43 is configured to adjust a network parameter of the image processing model based on the detection result.

Referring to fig. 10, fig. 10 is a schematic diagram of another frame of an embodiment of an image processing apparatus according to the present application. The image processing device 50 comprises an acquisition module 51 and a detection module 52, wherein the acquisition module 51 is used for acquiring a target image; the detection module 52 is configured to process the target image by using an image processing model to obtain a detection result of the target image, where the image processing model is trained by using a training device of the image processing model.

Referring to fig. 11, fig. 11 is a schematic frame diagram of an electronic device according to an embodiment of the present application. The electronic device 60 comprises a memory 61 and a processor 62 coupled to each other, and the processor 62 is configured to execute program instructions stored in the memory 61 to implement the steps in any of the above-described embodiments of the image processing model training method or to implement the steps in any of the above-described embodiments of the image processing method. In one particular implementation scenario, the electronic device 60 may include, but is not limited to: a microcomputer, a server, and the electronic device 60 may also include a mobile device such as a notebook computer, a tablet computer, and the like, which is not limited herein.

In particular, the processor 62 is configured to control itself and the memory 61 to implement the steps in any of the above-described embodiments of the image processing model training method, or to implement the steps in any of the above-described embodiments of the image processing method. The processor 62 may also be referred to as a CPU (Central Processing Unit). The processor 62 may be an integrated circuit chip having signal processing capabilities. The Processor 62 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. Additionally, the processor 62 may be collectively implemented by an integrated circuit chip.

Referring to fig. 12, fig. 12 is a schematic diagram of a frame of an embodiment of a computer readable storage medium of the present application. The computer readable storage medium 70 stores program instructions 71 capable of being executed by a processor, the program instructions 71 being for implementing the steps in any of the above-described embodiments of the image processing model training method, or implementing the steps in any of the above-described embodiments of the image processing method.

According to the scheme, the correction of the feature similarity is realized by combining the reference relation parameters and the similarity corresponding to the region combination, so that the similarity conditions of different regions in the same image can be compared subsequently, the training of the feature extraction network can be realized by using the contrast learning method, the contrast learning is applied to one image, the application range of the contrast learning training method is expanded, and the extraction accuracy of the feature extraction network on the same image information is improved. In addition, the contrast learning is applied to one image, so that the method is more applicable to the image segmentation task compared with the contrast learning by using different images, and the image processing accuracy of the image processing model is improved.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, which is not described herein again for brevity.

The foregoing description of the various embodiments is intended to highlight different aspects of the various embodiments that are the same or similar, which can be referenced with respect to each other and therefore not be repeated herein for brevity.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely one type of logical division, and an actual implementation may have another division, for example, a unit or a component may be combined or integrated with another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product stored in a storage medium, and the software product includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method of the embodiments of the present application. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

If the technical scheme of the application relates to personal information, a product applying the technical scheme of the application clearly informs personal information processing rules before processing the personal information, and obtains personal independent consent. If the technical scheme of the application relates to sensitive personal information, the product applying the technical scheme of the application obtains individual consent before processing the sensitive personal information, and simultaneously meets the requirement of 'explicit consent'. For example, at a personal information collection device such as a camera, a clear and significant identifier is set to inform that the personal information collection range is entered, the personal information is collected, and if the person voluntarily enters the collection range, the person is regarded as agreeing to collect the personal information; or on the device for processing the personal information, under the condition of informing the personal information processing rule by using obvious identification/information, obtaining personal authorization by modes of popping window information or asking a person to upload personal information of the person by himself, and the like; the personal information processing rule may include information such as a personal information processor, a personal information processing purpose, a processing method, and a type of personal information to be processed.

Claims

1. A method for training a feature extraction network, comprising:

carrying out feature extraction on the sample image by using a feature extraction network to obtain a sample feature map of the sample image;

determining feature similarity of a plurality of groups of region combinations by using the sample feature map, wherein the sample image comprises a plurality of local regions, and the feature similarity of each group of region combinations represents the similarity of feature information of at least two local regions included in the region combinations; and the number of the first and second groups,

determining reference relation parameters corresponding to each group of the area combinations based on the labeling information of each group of the area combinations, wherein each reference relation parameter represents the actual difference condition between local areas in the area combinations;

and adjusting the network parameters of the feature extraction network by using the feature similarity and the reference relation parameters of each group of the regional combination.

2. The method according to claim 1, wherein the labeling information of the region combination includes a preset classification to which a pixel point in each of the local regions of the region combination belongs;

the determining reference relation parameters corresponding to the area combinations of each group based on the labeling information of the area combinations of each group comprises:

respectively taking each area combination as a target area combination;

respectively determining category parameters of each target local area about each preset classification based on the labeling information of the target area combination, wherein the target local area is a local area in the target area combination, and the category parameters about the preset classification represent pixel point conditions belonging to the preset classification in the target local area;

and obtaining a reference relation parameter of the target area combination based on the category parameter.

3. The method according to claim 2, wherein the determining the category parameter of each target local region with respect to each of the preset classifications based on the labeling information of the target region combination comprises:

respectively taking each preset classification as a target classification;

for each target local area, counting the number of pixel points belonging to the target classification in the target local area, and determining the category parameters of the target local area about the target classification based on the number of pixel points belonging to the target classification;

and/or, the obtaining of the reference relationship parameter of the target area combination by using the category parameter includes:

for each preset classification, obtaining a class parameter difference of the target region combination about the preset classification based on the class parameter of each target local region about the preset classification;

and obtaining a reference relation parameter of the target area combination based on the category parameter difference of the target area combination relative to each preset classification.

4. The method of claim 3, wherein the determining the category parameter of the target local area with respect to the target classification based on the number of pixel points belonging to the target classification comprises:

and taking the ratio of the number of the pixel points belonging to the target classification to the total number of the pixel points of the target local area as a category parameter of the target local area relative to the target classification.

5. The method according to claim 3, wherein the obtaining the difference of the category parameter of the target region combination with respect to the preset classification based on the category parameter of each target local region belonging to the preset classification comprises:

taking the difference between the category parameters of the target local areas related to the preset classification as the category parameter difference of the target area combination related to the preset classification;

obtaining a reference relation parameter of the target area combination based on the category parameter difference of the target area combination with respect to each preset classification, including:

obtaining a statistic value of the target area combination about the category parameter difference of each preset classification;

and obtaining the reference relation parameter of the target area combination by utilizing the statistic value of the target area combination, wherein the reference relation parameter of the target area combination is in negative correlation with the statistic value of the target area combination.

6. The method according to any one of claims 1 to 5, wherein the adjusting the network parameters of the feature extraction network by using the feature similarity of each group of the area combination and the reference relationship parameter comprises:

respectively taking each group of the area combination as a target area combination, adjusting the feature similarity of the target area combination by using the reference relation parameters of the target area combination to obtain the reference feature similarity of the target area combination, and obtaining a first loss of the target area combination based on the reference feature similarity of the target area combination;

obtaining a second loss of the feature extraction network based on the first loss of each group of the regional combinations;

and adjusting the network parameters of the feature extraction network by using the second loss.

7. The method according to claim 6, wherein the adjusting the feature similarity of the target area combination by using the reference relationship parameter of the target area combination to obtain the reference feature similarity of the target area combination comprises:

taking the product of the reference relation parameter of the target area combination and the feature similarity of the target area combination as the reference feature similarity of the target area combination;

the obtaining a first loss of the target area combination based on the reference feature similarity of the target area combination includes:

the method comprises the steps of correspondingly obtaining auxiliary feature similarity of each auxiliary area combination by using feature similarity and auxiliary relation parameters of at least one auxiliary area combination, wherein the auxiliary area combination and the target area combination have at least one same local area, and the auxiliary relation parameters of the auxiliary area combination are obtained on the basis of reference relation parameters of the auxiliary area combination;

and obtaining a first loss corresponding to the target area combination based on the reference feature similarity and the auxiliary feature similarity.

8. The method according to claim 7, wherein one of the local regions in the target region combination is a reference region, and the auxiliary region combination is a region combination including the reference region;

and/or the sum of the auxiliary relation parameter of the auxiliary area combination and the reference relation parameter of the auxiliary area combination is a preset value;

and/or obtaining a first loss corresponding to the target area combination based on the reference feature similarity and the assistant feature similarity, including:

performing preset operation on the reference feature similarity to obtain a first operation result;

respectively performing the preset operation on the auxiliary feature similarity of each auxiliary area combination to obtain a second operation result corresponding to each auxiliary area combination;

and obtaining a first loss corresponding to the target area combination based on the ratio of the first operation result to the sum of the second operation results corresponding to the auxiliary area combinations.

9. The method according to any one of claims 1 to 8, wherein the feature extraction network is part of an image processing model for predicting the sample image based on a sample feature map of the sample image;

10. The method according to any one of claims 1 to 9, wherein the reference relationship parameter of the area combination is positively correlated with the actual degree of class similarity between the local areas in the area combination;

and/or, every two different local areas in the sample image form a group of the area combination;

and/or, determining feature similarity of a plurality of groups of region combinations by using the sample feature map, including:

for each group of the area combination, acquiring feature information corresponding to each local area in the area combination from the sample feature map;

and obtaining the feature similarity of the region combination by using the feature information corresponding to each local region in the region combination.

11. A method for training an image processing model,

performing the method of any of claims 1 to 10, pre-training a feature extraction network of the image processing model;

performing image processing on the sample image by using the image processing model to obtain a detection result of the sample image, wherein the image processing comprises performing feature extraction on the sample image by using the feature extraction network to obtain an original feature map; performing feature optimization on corresponding feature areas in the original feature map by using uncertainty parameters corresponding to the feature areas in the original feature map to obtain a target feature map, wherein in the target feature map, the influence of target feature information corresponding to the feature areas is related to the uncertainty parameters corresponding to the feature areas; obtaining a detection result of the sample image based on the target feature map;

and adjusting network parameters of the image processing model based on the detection result.

12. An image processing method, comprising:

acquiring a target image;

processing the target image by using an image processing model to obtain a detection result of the target image, wherein the image processing model is trained by using the method of claim 11.

13. An apparatus for training a feature extraction network, comprising:

the acquisition module is used for extracting the characteristics of the sample image by using a characteristic extraction network to obtain a sample characteristic diagram of the sample image;

the first determination module is used for determining the feature similarity of a plurality of groups of region combinations by using the sample feature map, wherein the sample image comprises a plurality of local regions, and the feature similarity of each group of region combinations represents that the feature similarity of at least two local regions included in the region combinations is obtained;

a second determining module, configured to determine, based on label information of each group of the region combinations, reference relationship parameters corresponding to each group of the region combinations, where each reference relationship parameter represents an actual difference situation between local regions in the region combinations;

and the adjusting module is used for adjusting the network parameters of the feature extraction network by utilizing the feature similarity and the reference relation parameters of each group of the area combination.

14. An apparatus for training an image processing model, comprising:

a pre-training module for performing the method of any one of claims 1 to 10, pre-training a feature extraction network of the image processing model;

the detection module is used for carrying out image processing on the sample image by using the image processing model to obtain a detection result of the sample image, wherein the image processing comprises carrying out feature extraction on the sample image by using the feature extraction network to obtain an original feature map; performing feature optimization on corresponding feature areas in the original feature map by using uncertainty parameters corresponding to the feature areas in the original feature map to obtain a target feature map, wherein in the target feature map, the influence of target feature information corresponding to the feature areas is related to the uncertainty parameters corresponding to the feature areas; obtaining a detection result of the sample image based on the target feature map;

and the adjusting module is used for adjusting the network parameters of the image processing model based on the detection result.

15. An image processing apparatus characterized by comprising:

the acquisition module is used for acquiring a target image;

a detection module, configured to process the target image by using an image processing model to obtain a detection result of the target image, where the image processing model is trained by using the method according to claim 11.

16. An electronic device comprising a memory and a processor coupled to each other, the processor being configured to execute program instructions stored in the memory to implement the method of training a feature extraction network according to any one of claims 1 to 10, or to implement the method of training an image processing model according to claim 11, or to implement the method of image processing according to claim 12.

17. A computer-readable storage medium, on which program instructions are stored, which program instructions, when executed by a processor, implement the method of training a feature extraction network of any one of claims 1 to 10, or implement the method of training an image processing model of claim 11, or implement the method of image processing of claim 12.