CN111145196A

CN111145196A - Image segmentation method and device and server

Info

Publication number: CN111145196A
Application number: CN201911266841.6A
Authority: CN
Inventors: 廖祥云; 孙寅紫; 王琼; 王平安
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2019-12-11
Filing date: 2019-12-11
Publication date: 2020-05-12
Also published as: WO2021115061A1

Abstract

The application belongs to the technical field of image segmentation, and provides an image segmentation method, an image segmentation device and a server, wherein the method comprises the following steps: inputting an image to be segmented into an image segmentation model, performing feature extraction on the image to be segmented to generate a feature map, and calculating a spatial position relationship between pixel points in the feature map to obtain spatial position information; fusing the feature map and the spatial position information to obtain a feature map containing spatial position information; and segmenting the image to be segmented according to the feature map containing the spatial position information, and outputting a target image. The method and the device solve the problem that the boundary between different targets to be segmented cannot be accurately segmented.

Description

Image segmentation method and device and server

Technical Field

The present invention relates to the field of image segmentation technologies, and in particular, to an image segmentation method, an image segmentation apparatus, and a server.

Background

Image segmentation is one of research hotspots of computer graphics, and has important application in the fields of medical disease diagnosis, unmanned driving and the like. There are several methods for image segmentation algorithm, among which (U-Net) U-type neural network algorithm is one of the most commonly used algorithms. The U-shaped neural network algorithm is composed of an encoder and a decoder, and the encoder and the decoder are connected through dimension splicing of image channels. Specifically, an image to be segmented is firstly subjected to image feature extraction through an encoder, the encoder is composed of a plurality of convolutional layers, and the convolutional layers are connected through a pooling layer, so that the dimensionality of an original image is reduced to a certain size. The image output from the encoder is then restored to the original image size by a decoder, which is composed of a plurality of convolutional layers connected by a transposed convolutional layer. And finally, converting the output image into a probability map by using a softmax activation function. Compared with the traditional image segmentation algorithm, such as threshold segmentation, region segmentation and edge segmentation, the UNet algorithm has simple network structure and high image segmentation accuracy. However, the UNet image segmentation algorithm has the problem of exaggerating the difference between similar objects (inter-class distinction) or the similarity between different objects (intra-class similarity), and cannot segment the boundary between the similar features of different classes and the different features of the same class. Therefore, the boundary between different targets to be segmented cannot be accurately segmented, and the segmentation accuracy of the image is low.

Disclosure of Invention

In view of this, embodiments of the present invention provide an image segmentation method, an image segmentation device, and a server, so as to solve a problem that boundaries between different targets to be segmented cannot be accurately segmented.

A first aspect of an embodiment of the present invention provides an image segmentation method, including:

inputting an image to be segmented into an image segmentation model, performing feature extraction on the image to be segmented to generate a feature map, and calculating a spatial position relationship between pixel points in the feature map to obtain spatial position information;

fusing the feature map and the spatial position information to obtain a feature map containing spatial position information;

and segmenting the image to be segmented according to the feature map containing the spatial position information, and outputting a target image.

In an implementation example, the inputting an image to be segmented into an image segmentation model, performing feature extraction on the image to be segmented to generate a feature map, and calculating a spatial position relationship between pixel points in the feature map to obtain spatial position information includes:

performing feature extraction on the image to be segmented through N information extraction modules connected in series in an encoder to generate a feature map; the N information extraction modules are set according to preset scale information, and N is more than or equal to 1;

and for each information extraction module, calculating the spatial position relationship among the pixel points in the characteristic diagram generated by the information extraction module to obtain spatial position information.

when the image to be segmented is input into a first information extraction module, performing feature extraction on the image to be segmented to generate a feature map, and calculating a spatial position relationship between pixel points in the feature map to obtain spatial position information;

and fusing the feature map and the spatial position information to generate a new feature map and outputting the new feature map to a next information extraction module so that the next information extraction module performs feature extraction and spatial position relation calculation on the new feature map.

In an implementation example, the fusing the feature map and the spatial location information to obtain a feature map including spatial location information includes:

and fusing the feature diagram and the spatial position information output by the Nth information extraction module through an encoder to generate context information.

In an implementation example, for each information extraction module, calculating a spatial position relationship between pixel points in the feature map generated by the information extraction module to obtain spatial position information includes:

for each information extraction module, carrying out convolution on the feature map along the direction perpendicular to the feature map generated by the information extraction module through a convolution neural network, and calculating the spatial position relation between pixel points in the feature map to obtain spatial position information.

if the feature map generated by the information extraction module is a two-dimensional feature map, a formula for calculating the spatial position relationship between the pixel points in the feature map generated by the information extraction module is as follows:

wherein a is the spatial position information; δ is the activation function; l is the number of convolution layers of the convolution neural network; w_(i，j)The weight coefficient of a pixel point with coordinates (i, j) in the feature map is shown as the formula, k is the channel number of the feature map, b is the offset, and ⊙ is the Hadamard product.

if the feature map generated by the information extraction module is a three-dimensional feature map, a formula for calculating the spatial position relationship between the pixel points in the feature map generated by the information extraction module is as follows:

wherein a is the spatial position information; δ is the activation function; l is the number of convolution layers of the convolution neural network; w is a_(i，j，k)The weight coefficient of a pixel point with coordinates (i, j, k) in the feature map, m is the channel number of the feature map, b is the offset, and ⊙ is the Hadamard product.

In an implementation example, the segmenting the image to be segmented according to the feature map containing the spatial position information and outputting a target image includes:

and segmenting the image to be segmented according to the context information through a decoder, and outputting a target image.

A second aspect of an embodiment of the present invention provides an image segmentation apparatus, including:

the image characteristic and position information extraction module is used for inputting an image to be segmented into an image segmentation model, performing characteristic extraction on the image to be segmented to generate a characteristic diagram, and calculating a spatial position relation between pixel points in the characteristic diagram to obtain spatial position information;

the characteristic fusion module is used for fusing the characteristic graph and the spatial position information to obtain a characteristic graph containing spatial position information;

and the image segmentation module is used for segmenting the image to be segmented according to the feature map containing the spatial position information and outputting a target image.

A third aspect of embodiments provides a server comprising: a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the image segmentation method of the first aspect when executing the computer program.

According to the image segmentation method, the image segmentation device and the server provided by the embodiment of the invention, an image to be segmented is input into an image segmentation model, features of the image to be segmented are extracted to generate a feature map, and spatial position relation among pixel points in the feature map is calculated to obtain spatial position information; fusing the feature map and the spatial position information to obtain a feature map containing spatial position information; and segmenting the image to be segmented according to the feature map containing the spatial position information, and outputting a target image. Spatial position information is obtained by calculating the spatial position relationship among the pixel points in the feature map, and the extraction of the relative position relationship of the pixel points in the feature map at different spatial positions is realized. After the feature map containing the image information and the calculated spatial position information are fused to obtain the feature map containing the spatial position information, the image to be segmented is segmented according to the feature map containing the spatial position information, so that an image segmentation model can obtain the feature relationship among the pixels of the feature map according to the spatial position relationship among the pixels in the feature map, the boundaries between 'different similar features' and 'same type difference features' are segmented, the boundaries between different targets to be segmented are accurately segmented, and the segmentation accuracy of the image is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed for the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flowchart of an image segmentation method according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of an image segmentation model according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating an image segmentation method according to a second embodiment of the present invention;

fig. 4 is a schematic diagram illustrating a convolution calculation of the feature map depth convolution layer of the second branch in the information extraction module according to the second embodiment of the present invention;

FIG. 5 is a schematic structural diagram of an image segmentation apparatus according to a third embodiment of the present invention;

fig. 6 is a schematic structural diagram of a server according to a fourth embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood by those skilled in the art, the technical solutions in the embodiments of the present invention will be clearly described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "comprises" and "comprising," and any variations thereof, in the description and claims of this invention and the above-described drawings are intended to cover non-exclusive inclusions. For example, a process, method, or system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus. Furthermore, the terms "first," "second," and "third," etc. are used to distinguish between different objects and are not used to describe a particular order.

Example one

Fig. 1 is a schematic flow chart of an image segmentation method according to an embodiment of the present invention. The method can be applied to application scenes for carrying out multi-target segmentation on the images, and can be executed by an image segmentation device, wherein the device can be a server, an intelligent terminal, a tablet or a PC (personal computer) and the like; in the embodiment of the present application, an image segmentation apparatus is taken as an execution subject, and the method specifically includes the following steps:

s110, inputting an image to be segmented into an image segmentation model, performing feature extraction on the image to be segmented to generate a feature map, and calculating a spatial position relationship between pixel points in the feature map to obtain spatial position information;

in the existing image segmentation method, an image segmentation model containing a neural network can be constructed by deep learning a target image to perform image segmentation. However, the image features extracted from the image to be segmented after the convolution calculation of the multilayer convolution layer in the trained image segmentation model often have the problem of exaggerating the difference (inter-class distinction) between similar objects or the similarity (intra-class consistency) between different objects. When the image segmentation model performs target image segmentation on an image to be segmented according to the extracted features, the boundaries between different types of similar features and the boundaries between the same types of different features cannot be segmented, so that over-segmentation and under-segmentation in the segmentation process are caused, and the boundaries between different target images to be segmented are difficult to accurately segment. In order to solve the technical problem, the characteristic relation among the characteristic image pixels can be extracted from different layers of a convolutional neural network of an image segmentation model, and the problem that the boundary between different kinds of similar characteristics and the same kind of different characteristics cannot be segmented is solved.

Specifically, the image to be segmented may be segmented by an image segmentation model trained from a plurality of target images. After the image to be segmented is input into the image segmentation model, image feature extraction is carried out on the image to be segmented to generate a feature map, and the spatial position relation among pixel points in the feature map is calculated to obtain spatial position information, so that the relative position relation of the pixel points in the feature map at different spatial positions is obtained.

In one implementation example, the image segmentation model may employ a U-shaped neural network (Feature Depth UNet) framework, forming a symmetric structure by an encoder and a decoder; and the encoder and the decoder are spliced through the dimension of an image channel. Fig. 2 is a schematic structural diagram of an image segmentation model. The specific process of performing feature extraction on the image to be segmented to generate a feature map and calculating a spatial position relationship between pixel points in the feature map to obtain spatial position information may be as follows: performing feature extraction on the image to be segmented through N information extraction modules connected in series in an encoder to generate a feature map; the N information extraction modules are set according to preset scale information, and N is more than or equal to 1; and for each information extraction module, calculating the spatial position relationship among the pixel points in the characteristic diagram generated by the information extraction module to obtain spatial position information.

Specifically, the encoder comprises N information extraction modules connected in series, and the N information extraction modules extract image features of an input image to be segmented to generate a feature map. The N information extraction modules are arranged according to preset size information, so that each information extraction module has different size information. After the image to be segmented is input into the image segmentation model, feature extraction is carried out on the image to be segmented through the N information extraction modules, and a feature map containing multi-scale information can be generated; and after each information extraction module extracts the features of the image to be segmented, calculating the spatial position relationship among the pixel points in the feature map generated by the information extraction module to obtain spatial position information. And calculating the spatial position relation among pixel points in the characteristic diagram by N information extraction modules corresponding to different scale information to obtain spatial position information, and obtaining the spatial position information containing multi-scale information.

In an implementation example, the specific process of performing feature extraction on the image to be segmented by N information extraction modules connected in series in an encoder to generate a feature map, and calculating a spatial position relationship between pixel points in the feature map generated by each information extraction module to obtain spatial position information may be as follows: when the image to be segmented is input into a first information extraction module, performing feature extraction on the image to be segmented to generate a feature map, and calculating a spatial position relationship between pixel points in the feature map to obtain spatial position information; and fusing the feature map and the spatial position information to generate a new feature map and outputting the new feature map to a next information extraction module so that the next information extraction module performs feature extraction and spatial position relation calculation on the new feature map.

Specifically, each information extraction module may include two branches, where a first branch is used to perform feature extraction on an input image to generate a feature map, so as to extract pixel value information of the image; the second branch performs feature extraction on the input image in the same way as the first branch to generate a feature map, and then calculates the spatial position relationship between pixel points in the feature map to obtain spatial position information, so as to extract the spatial position relationship information between the pixels. Optionally, the first branch for performing feature extraction on the input image to generate the feature map may be formed by a plurality of convolution layers; the second branch can be composed of a plurality of convolution layers which are the same as the first branch and a feature map depth convolution layer, so that the feature extraction of the input image is realized to generate a feature map, and then the spatial position relation among pixel points in the feature map is calculated to obtain spatial position information. And the N information extraction modules in the encoder can be connected in series through the maximum pooling layer.

Expanding the field of view by superimposing the plurality of convolution layers in the second branch in front of the feature map depth convolution layer; when the feature map obtained by extracting features from the plurality of convolution layers is input into the feature map depth convolution layer to calculate the spatial position relation between the pixel points in the feature map, each pixel point in the feature map depth convolution layer can map different visual fields on an original image. Optionally, to reduce the overfitting phenomenon, a batch normalization layer may be added between the convolution layers in each information extraction module, and L2 regularization may be added after the loss function.

In detail, when the image to be segmented is input into a first information extraction module, feature extraction is carried out on the image to be segmented through a plurality of convolution layers in a first branch of the first information extraction module to generate a feature map; and simultaneously, carrying out feature extraction on the image to be segmented through a plurality of convolution layers in a second branch in the first information extraction module to generate a feature map, and calculating the spatial position relation among pixel points in the feature map through the feature map depth convolution layers in the second branch to obtain spatial position information. And fusing the feature graph output by the first branch of the first information extraction module and the spatial position information output by the second branch of the first information extraction module through a pooling layer to generate a new feature graph, and inputting the newly generated feature graph to a next information extraction module, so that the first branch of the next information extraction module performs feature extraction on the feature graph of the input module, and the second branch of the next information extraction module performs feature extraction on the feature graph of the input module and calculates a spatial position relationship. Until the first branch of the Nth information extraction module performs feature extraction on the feature graph of the input module to generate a feature graph containing multi-scale information, and the second branch of the Nth information extraction module performs feature extraction on the feature graph of the input module and calculates a spatial position relation to obtain spatial position information containing the multi-scale information.

S120, fusing the feature map and the spatial position information to obtain a feature map containing spatial position information;

and after the spatial position relationship between pixel points in the feature map generated by each information extraction module is calculated to obtain spatial position information, the feature map and the spatial position information which are finally obtained are output by the Nth information extraction module. And fusing the feature map output by the Nth information extraction module and the spatial position information to obtain a feature map, and completing feature fusion.

In one embodiment, since the image segmentation model may be composed of an encoder and a decoder, the decoder needs to perform image segmentation on the image to be segmented according to the context information sent by the encoder. The context information can be generated by fusing the feature map and the spatial position information output by the Nth information extraction module through a pooling layer in the encoder.

S130, segmenting the image to be segmented according to the feature map containing the spatial position information, and outputting a target image.

Since the image segmentation model may include an encoder and a decoder, the encoder and the decoder are symmetric structures. The decoder is provided with a corresponding transposed convolutional layer corresponding to the convolutional layer structure in the encoder. And to make the neural network retain the shallower information, the encoder and decoder are connected by hopping. In one implementation example, the target image is output by a decoder segmenting the image to be segmented according to the context information encoded by the encoder. Because the context information is generated according to the feature map containing the spatial position information, the decoder can obtain the feature relationship among the pixels of the feature map according to the spatial position relationship among the pixels in the context information, so that the boundaries between the similar features of different types and the different features of the same type are segmented, and the accurate segmentation of the boundaries between different targets to be segmented is realized.

The image segmentation method provided by the embodiment of the invention inputs an image to be segmented into an image segmentation model, performs feature extraction on the image to be segmented to generate a feature map, and calculates a spatial position relationship between pixel points in the feature map to obtain spatial position information; fusing the feature map and the spatial position information to obtain a feature map containing spatial position information; and segmenting the image to be segmented according to the feature map containing the spatial position information, and outputting a target image. Spatial position information is obtained by calculating the spatial position relationship among the pixel points in the feature map, and the extraction of the relative position relationship of the pixel points in the feature map at different spatial positions is realized. After the feature map containing the image information and the calculated spatial position information are fused to obtain the feature map containing the spatial position information, the image to be segmented is segmented according to the feature map containing the spatial position information, so that an image segmentation model can obtain the feature relationship among the pixels of the feature map according to the spatial position relationship among the pixels in the feature map, the boundaries between 'different similar features' and 'same type difference features' are segmented, the boundaries between different targets to be segmented are accurately segmented, and the segmentation accuracy of the image is improved.

Example two

Fig. 3 is a schematic flow chart of an image segmentation method according to a second embodiment of the present invention. On the basis of the first embodiment, the present embodiment further provides a process of calculating a spatial position relationship between pixel points in the feature map to obtain spatial position information, so as to further improve the accuracy of image segmentation. The method specifically comprises the following steps:

s210, inputting an image to be segmented into an image segmentation model, and performing feature extraction on the image to be segmented through N information extraction modules connected in series in an encoder to generate a feature map; the N information extraction modules are set according to preset scale information, and N is more than or equal to 1;

the N information extraction modules are arranged according to preset size information, so that each information extraction module has different size information. After the image to be segmented is input into the image segmentation model, feature extraction is carried out on the image to be segmented through the N information extraction modules, and a feature map containing multi-scale information can be generated; and after each information extraction module extracts the features of the image to be segmented, calculating the spatial position relationship among the pixel points in the feature map generated by the information extraction module to obtain spatial position information. And calculating the spatial position relation among pixel points in the characteristic diagram by N information extraction modules corresponding to different scale information to obtain spatial position information, and obtaining the spatial position information containing multi-scale information.

Specifically, each information extraction module may include two branches, where a first branch is used to perform feature extraction on an input image to generate a feature map; the second branch performs feature extraction on the input image in the same way as the first branch to generate a feature map, and then calculates the spatial position relationship between pixel points in the feature map to obtain spatial position information. Optionally, the first branch for performing feature extraction on the input image to generate the feature map may be formed by a plurality of convolution layers; the second branch can be composed of a plurality of convolution layers which are the same as the first branch and a feature map depth convolution layer, so that the feature extraction of the input image is realized to generate a feature map, and then the spatial position relation among pixel points in the feature map is calculated to obtain spatial position information. And the N information extraction modules in the encoder can be connected in series through the maximum pooling layer.

When the image to be segmented is input into a first information extraction module, performing feature extraction on the image to be segmented through a plurality of convolution layers in a first branch of the first information extraction module to generate a feature map; and simultaneously, carrying out feature extraction on the image to be segmented through a plurality of convolution layers in a second branch in the first information extraction module to generate a feature map, and calculating the spatial position relation among pixel points in the feature map through the feature map depth convolution layers in the second branch to obtain spatial position information. And fusing the feature graph output by the first branch of the first information extraction module and the spatial position information output by the second branch of the first information extraction module through a pooling layer to generate a new feature graph, and inputting the newly generated feature graph to a next information extraction module, so that the first branch of the next information extraction module performs feature extraction on the feature graph of the input module, and the second branch of the next information extraction module performs feature extraction on the feature graph of the input module and calculates a spatial position relationship. Until the first branch of the Nth information extraction module performs feature extraction on the feature graph of the input module to generate a feature graph containing multi-scale information, and the second branch of the Nth information extraction module performs feature extraction on the feature graph of the input module and calculates a spatial position relation to obtain spatial position information containing the multi-scale information.

S220, for each information extraction module, performing convolution on the feature map along a direction perpendicular to the feature map generated by the information extraction module through a convolutional neural network, and calculating a spatial position relationship between pixel points in the feature map to obtain spatial position information;

specifically, each information extraction module can comprise two branches, and the second branch can be composed of a plurality of convolution layers identical to the first branch and a feature map depth convolution layer, so that after feature extraction is performed on an input image to generate a feature map, the spatial position relation between pixel points in the feature map is calculated to obtain spatial position information. Thus, for each of the information extraction modules, convolving the feature map by a convolutional neural network in a direction perpendicular to the feature map generated by the information extraction module may be: and performing convolution on the feature map by the feature map depth convolution layer of the second branch in the information extraction module along the feature map direction obtained by convolution calculation of a plurality of convolution layers vertical to the second branch, and calculating the spatial position relation among pixel points in the feature map to obtain spatial position information.

In an implementation example, if the feature map generated by the information extraction module, that is, the feature map obtained by convolution calculation of the plurality of convolution layers of the second branch, is a two-dimensional feature map, the formula for calculating the spatial position relationship between the pixel points in the feature map by using the feature map depth convolution layer of the second branch in the information extraction module is as follows:

wherein a is the spatial position information; δ is the activation function; l is the number of convolution layers of the convolution neural network; w is a_(i，j)The weight coefficient of a pixel point with coordinates (i, j) in the feature map, k is the channel number of the feature map, b is the offset, and ⊙ is the Hadamard product.

Specifically, the feature map depth convolution layer of the second branch in the information extraction module may use H × W × C convolution kernels, where H × W represents the size of the convolution kernels, and C represents the number of convolution kernels, and the value of C is equal to the number of pixels of the output feature map in the XY plane. And (4) optional. Fig. 4 is a schematic diagram illustrating the convolution calculation of the feature map depth convolution layer of the second branch in the information extraction module. In order to calculate the output of the deep convolution of the two-dimensional feature map, firstly, an H multiplied by W convolution kernel is put at the position of the leftmost upper corner of the feature map, and the first convolution operation is carried out. Then, the convolution kernel is slid along the Z-axis direction, and the same convolution operation is performed in sequence along the direction perpendicular to the feature map. And finally, arranging the convolution operation calculation results of the C convolution kernels on an XY plane according to the positions of the characteristic graphs to obtain spatial position information.

In an implementation example, the feature map generated by the information extraction module, that is, the feature map obtained by convolution calculation of the plurality of convolution layers of the second branch, is a three-dimensional feature map, and then the formula for calculating the spatial position relationship between the pixel points in the feature map by the feature map depth convolution layer of the second branch in the information extraction module is as follows:

wherein a is the spatial position information; δ is the activation function; l is the number of convolution layers of the convolution neural network; w_(i，j，k)The weight coefficient of a pixel point with coordinates (i, j, k) in the feature map, m is the channel number of the feature map, b is an offset, and ⊙ is a Hadamard product.

Specifically, the feature map depth convolution layer of the second branch in the information extraction module may use a convolution kernel of H × W × P × C, where H × W × P represents the size of the convolution kernel, and C represents the number of convolution kernels, and the value of C is equal to the number of pixels of the output feature map in the XY plane. In order to calculate the output of the depth convolution layer of the three-dimensional feature map, firstly, the convolution kernel of H multiplied by W multiplied by P is put at the position of the leftmost upper corner of the feature map, and the first three-dimensional convolution operation is carried out. Then, the convolution kernel is slid along the Z-axis direction, and the same three-dimensional convolution operation is performed along the direction perpendicular to the characteristic diagram. And finally, arranging the calculation results of the C convolution kernels on an XY plane according to the positions of the characteristic diagrams to obtain spatial position information.

S230, fusing the feature map and the spatial position information to obtain a feature map containing spatial position information;

and fusing the feature diagram and the spatial position information output by the Nth information extraction module through a pooling layer in the encoder to generate context information.

S240, segmenting the image to be segmented according to the feature map containing the spatial position information, and outputting a target image.

And segmenting the image to be segmented through a decoder according to the context information encoded by the encoder, and outputting a target image. Because the context information is generated according to the feature map containing the spatial position information, the decoder can obtain the feature relationship among the pixels of the feature map according to the spatial position relationship among the pixels in the context information, so that the boundaries between the similar features of different types and the different features of the same type are segmented, and the accurate segmentation of the boundaries between different targets to be segmented is realized.

EXAMPLE III

Fig. 5 shows an image segmentation apparatus according to a third embodiment of the present invention. On the basis of the first or second embodiment, the embodiment of the present invention further provides an image segmentation 5, including:

an image feature and position information extraction module 501, configured to input an image to be segmented into an image segmentation model, perform feature extraction on the image to be segmented to generate a feature map, and calculate a spatial position relationship between pixel points in the feature map to obtain spatial position information;

in an implementation example, when an image to be segmented is input into an image segmentation model, feature extraction is performed on the image to be segmented to generate a feature map, and a spatial position relationship between pixel points in the feature map is calculated to obtain spatial position information, the image feature and position information extraction module 501 includes:

the image feature extraction unit is used for performing feature extraction on the image to be segmented through N information extraction modules connected in series in the encoder to generate a feature map; the N information extraction modules are set according to preset scale information, and N is more than or equal to 1;

and the position information extraction unit is used for calculating the spatial position relation among the pixel points in the characteristic diagram generated by the information extraction module to obtain spatial position information for each information extraction module.

In an implementation example, for each information extraction module, when calculating a spatial position relationship between pixel points in a feature map generated by the information extraction module to obtain spatial position information, the position information extraction unit includes:

and the position information extraction subunit is used for performing convolution on the feature map along the direction perpendicular to the feature map generated by the information extraction module through a convolutional neural network for each information extraction module, and calculating the spatial position relationship among the pixel points in the feature map to obtain spatial position information.

A feature fusion module 502, configured to fuse the feature map and the spatial location information to obtain a feature map including spatial location information;

in an implementation example, when the feature map and the spatial location information are fused to obtain a feature map containing spatial location information, the feature fusion module 502 includes:

and the characteristic fusion unit is used for fusing the characteristic diagram and the spatial position information output by the Nth information extraction module through the encoder to generate context information.

And an image segmentation module 503, configured to segment the image to be segmented according to the feature map including the spatial location information, and output a target image.

In an implementation example, when the image to be segmented is segmented according to the feature map containing the spatial position information and a target image is output, the image segmentation module 503 includes:

and the image segmentation unit is used for segmenting the image to be segmented according to the context information through a decoder and outputting a target image.

The image segmentation device provided by the embodiment of the invention inputs an image to be segmented into an image segmentation model, performs feature extraction on the image to be segmented to generate a feature map, and calculates the spatial position relationship among pixel points in the feature map to obtain spatial position information; fusing the feature map and the spatial position information to obtain a feature map containing spatial position information; and segmenting the image to be segmented according to the feature map containing the spatial position information, and outputting a target image. Spatial position information is obtained by calculating the spatial position relationship among the pixel points in the feature map, and the extraction of the relative position relationship of the pixel points in the feature map at different spatial positions is realized. After the feature map containing the image information and the calculated spatial position information are fused to obtain the feature map containing the spatial position information, the image to be segmented is segmented according to the feature map containing the spatial position information, so that an image segmentation model can obtain the feature relationship among the pixels of the feature map according to the spatial position relationship among the pixels in the feature map, the boundaries between 'different similar features' and 'same type difference features' are segmented, the boundaries between different targets to be segmented are accurately segmented, and the segmentation accuracy of the image is improved.

Example four

Fig. 6 is a schematic structural diagram of a server according to a fourth embodiment of the present invention. The server includes: a processor 61, a memory 62 and a computer program 63, such as a program for an image segmentation method, stored in said memory 62 and executable on said processor 61. The processor 61 implements the steps in the above-described embodiment of the image segmentation method, such as steps S110 to S130 shown in fig. 1, when executing the computer program 63.

Illustratively, the computer program 63 may be partitioned into one or more modules that are stored in the memory 62 and executed by the processor 61 to accomplish the present application. The one or more modules may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 63 in the server. For example, the computer program 63 may be divided into an image feature and location information extraction module, a feature fusion module, and an image segmentation module, each module having the following specific functions:

The server may include, but is not limited to, a processor 61, a memory 62, and a computer program 63 stored in the memory 62. Those skilled in the art will appreciate that fig. 6 is merely an example of a server and is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or different components, e.g., the server may also include input-output devices, network access devices, buses, etc.

The Processor 61 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 62 may be an internal storage unit of the server, such as a hard disk or a memory of the server. The memory 62 may also be an external storage device, such as a plug-in hard disk provided on a server, a Smart Memory Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 62 may also include both an internal storage unit of the server and an external storage device. The memory 62 is used for storing the computer program and other programs and data required for the image segmentation method. The memory 62 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment. In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. An image segmentation method, comprising:

2. The image segmentation method of claim 1, wherein the inputting an image to be segmented into an image segmentation model, performing feature extraction on the image to be segmented to generate a feature map, and calculating a spatial position relationship between pixel points in the feature map to obtain spatial position information comprises:

3. The image segmentation method of claim 2, wherein the inputting of the image to be segmented into the image segmentation model, the feature extraction of the image to be segmented to generate the feature map, and the calculation of the spatial position relationship between the pixels in the feature map to obtain the spatial position information comprises:

4. The image segmentation method according to claim 3, wherein the fusing the feature map and the spatial position information to obtain a feature map including spatial position information includes:

5. The image segmentation method according to claim 3, wherein for each of the information extraction modules, calculating a spatial position relationship between pixel points in the feature map generated by the information extraction module to obtain spatial position information includes:

6. The image segmentation method according to claim 5, wherein for each of the information extraction modules, calculating a spatial position relationship between pixel points in the feature map generated by the information extraction module to obtain spatial position information includes:

wherein a is the spatial position information; δ is the activation function; l is the number of convolution layers of the convolution neural network; w is a_(i,j)The weight coefficient of a pixel point with coordinates (i, j) in the feature map is shown as the formula, k is the channel number of the feature map, b is the offset, and ⊙ is the Hadamard product.

7. The image segmentation method according to claim 5, wherein for each of the information extraction modules, calculating a spatial position relationship between pixel points in the feature map generated by the information extraction module to obtain spatial position information includes:

wherein a is the spatial position information; δ is the activation function; l is the number of convolution layers of the convolution neural network; w is a_(i,j,k)The weight coefficient of a pixel point with coordinates (i, j, k) in the feature map, m is the channel number of the feature map, b is the offset, and ⊙ is the Hadamard product.

8. The image segmentation method according to claim 4, wherein the segmenting the image to be segmented according to the feature map containing the spatial position information and outputting a target image comprises:

9. An image segmentation apparatus, comprising:

10. A server comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor when executing the computer program performs the steps of the image segmentation method according to any one of claims 1 to 8.