CN113011425A

CN113011425A - Image segmentation method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN113011425A
Application number: CN202110246708.5A
Authority: CN
Inventors: 纪德益; 祝澜耘
Original assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2021-06-22
Anticipated expiration: 2041-03-05
Also published as: CN113011425B; WO2022183730A1

Abstract

The present disclosure provides an image segmentation method, apparatus, electronic device and computer-readable storage medium; the method comprises the following steps: extracting bottom layer image features from an image to be processed, and performing semantic segmentation on the bottom layer image features to obtain high-layer semantic features; performing at least one of texture enhancement and texture feature statistics on the bottom layer image features to obtain bottom layer texture features; the bottom texture features are used for representing enhanced texture details and/or statistical distribution of the texture features of the image to be processed; and combining the high-level semantic features with the bottom-level texture features to obtain a semantic segmentation image. By the aid of the image segmentation method and the image segmentation device, accuracy of image segmentation can be improved.

Description

Image segmentation method and device, electronic equipment and computer readable storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to an image segmentation method and apparatus, an electronic device, and a computer-readable storage medium.

Background

The semantic segmentation aims at predicting semantic categories of each pixel of an input picture, is one of the most fundamental problems in the field of computer vision, and is widely applied in various fields including but not limited to automatic driving, human-computer interaction and the like. The current semantic segmentation method mainly focuses on utilizing context information in high-level features and using a deep full-convolution network to perform network inference to obtain a semantic segmentation result of an image. However, since the high-level features are extracted from the large receptive field of the neural network, semantic inference using only deep-level high-level features leads to a coarse and inaccurate output, thereby reducing the accuracy of image segmentation.

Disclosure of Invention

The embodiment of the disclosure provides an image segmentation method, an image segmentation device, an electronic device and a computer-readable storage medium, which can improve the accuracy of semantic inference.

The technical scheme of the embodiment of the disclosure is realized as follows:

the embodiment of the disclosure provides an image segmentation method, which includes:

extracting bottom layer image features from an image to be processed, and performing semantic segmentation on the bottom layer image features to obtain high-layer semantic features;

performing at least one of texture enhancement and texture feature statistics on the bottom layer image features to obtain bottom layer texture features; the bottom texture features are used for representing enhanced texture details and/or statistical distribution of texture features of the image to be processed;

and combining the high-level semantic features and the bottom-level texture features to obtain a semantic segmentation image.

In the above method, the performing at least one of texture enhancement and texture feature statistical processing on the bottom layer image feature to obtain the bottom layer texture feature includes any one of:

performing texture enhancement processing based on the bottom layer image features to obtain enhanced texture features; determining the bottom layer texture features based on the enhanced texture features;

performing texture feature statistical processing based on the bottom layer image features to obtain statistical texture features; determining the bottom layer texture features based on the statistical texture features;

and performing texture enhancement processing and texture feature statistical processing on the basis of the bottom layer image features to obtain the bottom layer texture features.

In the above method, the determining the bottom texture feature based on the enhanced texture feature includes:

determining the enhanced texture features as the bottom texture features; or,

and combining the enhanced texture features with the bottom layer image features to obtain the bottom layer texture features.

In the above method, the bottom texture features are determined based on the statistical texture features; the method comprises the following steps:

determining the statistical texture features as the bottom texture features; or,

and combining the statistical texture features with the bottom layer image features to obtain the bottom layer texture features.

In the above method, the performing texture enhancement processing and texture feature statistical processing based on the bottom layer image feature to obtain the bottom layer texture feature includes:

performing texture enhancement processing based on the bottom layer image features to obtain enhanced texture features;

and performing texture feature statistical processing at least based on the enhanced texture features to obtain the bottom texture features.

In the above method, the performing texture feature statistical processing based on at least the enhanced texture feature to obtain the bottom texture feature includes:

combining the enhanced texture features with the bottom layer image features, and then performing texture feature statistical processing to obtain middle bottom layer texture features; and determining the middle bottom layer texture feature as the bottom layer texture feature.

In the above method, after the enhanced texture features and the bottom layer image features are combined and then statistical processing of texture features is performed to obtain middle bottom layer texture features, the method further includes:

and combining at least one of the enhanced textural features and the bottom layer image features with the middle bottom layer textural features to obtain the bottom layer textural features.

In the above method, the performing texture enhancement processing based on the bottom layer image feature to obtain an enhanced texture feature includes:

performing one-dimensional quantization processing on the bottom layer image characteristics to obtain a first coding characteristic and a first initial quantization characteristic matrix;

and obtaining the enhanced texture feature based on the first coding feature and the first initial quantization feature matrix.

In the above method, the performing one-dimensional quantization processing on the bottom layer image feature to obtain a first coding feature and a first initial quantization feature matrix includes:

performing pooling processing on the bottom layer image features to obtain first average features corresponding to the bottom layer image features;

calculating a first similarity of the bottom layer image characteristic and the first average characteristic to obtain a first similarity matrix;

performing one-dimensional quantization processing on the first similarity matrix to obtain a first quantization series vector and the first coding feature;

and fusing the first coding feature and the first quantization series vector to obtain the first initial quantization feature matrix.

In the above method, the performing one-dimensional quantization processing on the first similarity matrix to obtain a first quantization level vector and a first coding feature includes:

combining the first similarity matrix in the dimensions of length and width to obtain a one-dimensional similarity vector;

carrying out N-level quantization processing on the one-dimensional similarity vector to obtain a first quantization characteristic value; the first quantized feature values form the first vector of quantization levels;

determining the first coding feature based on the first quantized feature value and a one-dimensional similarity vector; n is a positive integer greater than or equal to 1.

In the above method, the first quantized feature value includes: n sub-quantization feature values; the performing N-level quantization processing on the one-dimensional similarity vector to obtain a first quantization feature value includes:

performing N-level quantization processing according to the maximum value and the minimum value in the one-dimensional similarity vector to determine an average quantization value;

and obtaining the sub-quantization characteristic value of each level based on the average quantization value and the minimum value, thereby obtaining the N sub-quantization characteristic values.

In the above method, the fusing the first coding feature and the first quantization series vector to obtain the first initial quantization feature matrix includes:

carrying out space size averaging processing on the first coding features to obtain first coding average features;

splicing the first coding average feature and the first quantization series vector to obtain a first quantization statistical vector;

and performing spatial domain conversion on the first quantization statistical vector, and fusing the first quantization statistical vector with the first average characteristic to obtain the first initial quantization characteristic matrix.

In the above method, the obtaining the enhanced texture feature based on the first coding feature and the first initial quantization feature matrix includes:

carrying out graph reasoning on the first initial quantization feature matrix to obtain a first quantization feature matrix;

and obtaining the enhanced texture feature based on the first quantization feature matrix and the first coding feature.

In the above method, when the bottom layer image feature is an input image feature, the target statistical texture feature is the statistical texture feature; or,

under the condition that the enhanced texture features are input image features, the target statistical texture features are the bottom layer texture features; or,

and under the condition that the feature obtained by combining the enhanced texture feature and the bottom layer image feature is the input image feature, the target statistical texture feature is the middle bottom layer texture feature.

In the above method, the texture feature statistical processing is performed based on the input image features to obtain the target statistical texture feature, and the method includes:

performing two-dimensional quantization processing on the input image features to obtain the target statistical texture features; the target statistical texture feature represents the feature distribution relationship among the pixels; or,

performing size segmentation on the input image features to obtain local image features corresponding to the input image features; performing two-dimensional quantization processing on the local image features to obtain the target statistical texture features; or,

respectively carrying out two-dimensional quantization processing on the input image features and the local image features to obtain global quantization features corresponding to the input image features and local quantization features corresponding to the local image features;

and performing feature fusion based on the global quantization feature and the local quantization feature to obtain the target statistical texture feature.

In the above method, the performing two-dimensional quantization processing on the input image feature to obtain the target statistical texture feature includes:

pooling the input image features to obtain second average features corresponding to the input image features;

calculating a second similarity of the input image features and the second average features to obtain a second similarity matrix;

performing two-dimensional quantization processing on the second similarity matrix to obtain a second quantization series matrix and a second coding characteristic;

and fusing the second coding characteristics and the second quantization series matrix to obtain a second initial quantization characteristic matrix.

In the above method, the performing two-dimensional quantization processing on the second similarity matrix to obtain a second quantization progression matrix and a second coding feature includes:

performing M-level quantization processing on two dimensions in the second similarity matrix respectively to obtain a second quantization characteristic value; the second quantization feature values form the second quantization series matrix; m is a positive integer greater than or equal to 1;

determining an intermediate coding feature matrix based on the second quantized feature value and the second similarity matrix;

and based on the intermediate coding feature matrix, multiplying the current intermediate coding feature by the transpose of the adjacent intermediate coding feature, and determining a coding feature corresponding to the current intermediate coding feature, thereby obtaining a second coding feature.

In the above method, the fusing the second coding feature with the second quantization series matrix to obtain a second initial quantization feature matrix includes:

carrying out space size average processing on the second coding features to obtain second coding average features;

splicing the second coding average characteristic and the second quantization series vector to obtain a second quantization statistical matrix;

and performing space domain conversion on the second quantization statistical matrix, and fusing the second quantization statistical matrix with the second average characteristic to obtain a second initial quantization characteristic matrix.

In the above method, the performing feature fusion based on the global quantization feature and the local quantization feature to obtain the target statistical texture feature includes:

carrying out space conversion on the global quantization characteristics and averaging to obtain global average characteristics;

carrying out space conversion on the local quantization characteristics and averaging to obtain local average characteristics;

and performing feature fusion on the global average feature and the local average feature to obtain the target statistical texture feature.

An embodiment of the present disclosure provides an image segmentation apparatus, including:

the characteristic extraction module is used for extracting bottom layer image characteristics from the image to be processed;

the semantic segmentation module is used for performing semantic segmentation on the bottom layer image features to obtain high-level semantic features;

the texture feature processing module is used for performing at least one of texture enhancement and texture feature statistics on the bottom layer image features to obtain bottom layer texture features; the bottom texture features are used for representing enhanced texture details and/or statistical distribution of texture features of the image to be processed;

and the feature fusion module is used for combining the high-level semantic features and the bottom-level texture features to obtain a semantic segmentation image.

In the above apparatus, the texture feature processing module is further configured to:

In the above apparatus, the texture feature processing module is further configured to determine the enhanced texture feature as the bottom texture feature; or combining the enhanced texture features with the bottom layer image features to obtain the bottom layer texture features.

In the above apparatus, the texture feature processing module is further configured to determine the statistical texture feature as the bottom texture feature; or combining the statistical texture features with the bottom layer image features to obtain the bottom layer texture features.

In the above device, the texture feature processing module is further configured to perform texture enhancement processing based on the bottom layer image feature to obtain an enhanced texture feature; and performing texture feature statistical processing at least based on the enhanced texture features to obtain the bottom texture features.

In the above device, the texture feature processing module is further configured to combine the enhanced texture features with the bottom layer image features, and perform texture feature statistical processing to obtain middle bottom layer texture features; and determining the middle bottom layer texture feature as the bottom layer texture feature.

In the above apparatus, the texture feature processing module is further configured to combine the enhanced texture feature with the bottom layer image feature, perform statistical processing on the texture feature to obtain a middle bottom layer texture feature, and then combine at least one of the enhanced texture feature and the bottom layer image feature with the middle bottom layer texture feature to obtain the bottom layer texture feature.

In the above apparatus, the texture feature processing module is further configured to perform one-dimensional quantization processing on the bottom layer image feature to obtain a first coding feature and a first initial quantization feature matrix; and obtaining the enhanced texture feature based on the first coding feature and the first initial quantization feature matrix.

In the above apparatus, the texture feature processing module is further configured to perform pooling processing on the bottom layer image feature to obtain a first average feature corresponding to the bottom layer image feature; calculating a first similarity of the bottom layer image characteristic and the first average characteristic to obtain a first similarity matrix; performing one-dimensional quantization processing on the first similarity matrix to obtain a first quantization series vector and the first coding feature; and fusing the first coding feature and the first quantization series vector to obtain the first initial quantization feature matrix.

In the above apparatus, the texture feature processing module is further configured to combine the first similarity matrix in the length and width dimensions to obtain a one-dimensional similarity vector; carrying out N-level quantization processing on the one-dimensional similarity vector to obtain a first quantization characteristic value; the first quantized feature values form the first vector of quantization levels; determining the first coding feature based on the first quantized feature value and a one-dimensional similarity vector; n is a positive integer greater than or equal to 1.

In the above apparatus, the first quantized feature value includes: n sub-quantization feature values; the texture feature processing module is further configured to perform N-level quantization processing according to the maximum value and the minimum value in the one-dimensional similarity vector, and determine an average quantization value; and obtaining the sub-quantization characteristic value of each level based on the average quantization value and the minimum value, thereby obtaining the N sub-quantization characteristic values.

In the above apparatus, the texture feature processing module is further configured to perform spatial size averaging on the first coding feature to obtain a first coding average feature; splicing the first coding average feature and the first quantization series vector to obtain a first quantization statistical vector; and performing spatial domain conversion on the first quantization statistical vector, and fusing the first quantization statistical vector with the first average characteristic to obtain the first initial quantization characteristic matrix.

In the above apparatus, the texture feature processing module is further configured to perform graph inference on the first initial quantization feature matrix to obtain a first quantization feature matrix; and obtaining the enhanced texture feature based on the first quantization feature matrix and the first coding feature.

In the above apparatus, the texture feature processing module is further configured to, when the bottom-layer image feature is an input image feature, obtain a target statistical texture feature as the statistical texture feature; or, when the enhanced texture features are input image features, the target statistical texture features are the bottom layer texture features; or, when the feature obtained by combining the enhanced texture feature and the bottom layer image feature is an input image feature, the target statistical texture feature is the middle bottom layer texture feature.

In the above apparatus, the texture feature processing module is further configured to perform two-dimensional quantization processing on the input image feature to obtain the target statistical texture feature; the target statistical texture feature represents the feature distribution relationship among the pixels; or, performing size segmentation on the input image features to obtain local image features corresponding to the input image features; performing two-dimensional quantization processing on the local image features to obtain the target statistical texture features; or, performing two-dimensional quantization processing on the input image features and the local image features respectively to obtain global quantization features corresponding to the input image features and local quantization features corresponding to the local image features; and performing feature fusion based on the global quantization feature and the local quantization feature to obtain the target statistical texture feature.

In the above apparatus, the texture feature processing module is further configured to perform pooling processing on the input image feature to obtain a second average feature corresponding to the input image feature; calculating a second similarity of the input image features and the second average features to obtain a second similarity matrix; performing two-dimensional quantization processing on the second similarity matrix to obtain a second quantization series matrix and a second coding characteristic; and fusing the second coding characteristics and the second quantization series matrix to obtain a second initial quantization characteristic matrix.

In the above apparatus, the texture feature processing module is further configured to perform M-level quantization processing on two dimensions in the second similarity matrix, respectively, to obtain a second quantization feature value; the second quantization feature values form the second quantization series matrix; m is a positive integer greater than or equal to 1; determining an intermediate coding feature matrix based on the second quantized feature value and the second similarity matrix; and based on the intermediate coding feature matrix, multiplying the current intermediate coding feature by the transpose of the adjacent intermediate coding feature, and determining a coding feature corresponding to the current intermediate coding feature, thereby obtaining a second coding feature.

In the above apparatus, the texture feature processing module 502 is further configured to perform spatial size averaging on the second coding feature to obtain a second coding average feature; splicing the second coding average characteristic and the second quantization series vector to obtain a second quantization statistical matrix; and performing space domain conversion on the second quantization statistical matrix, and fusing the second quantization statistical matrix with the second average characteristic to obtain a second initial quantization characteristic matrix.

In the above apparatus, the texture feature processing module is further configured to perform spatial conversion on the global quantized features and perform averaging to obtain global average features; carrying out space conversion on the local quantization characteristics and averaging to obtain local average characteristics; and performing feature fusion on the global average feature and the local average feature to obtain the target statistical texture feature.

An embodiment of the present disclosure provides an electronic device, including:

a memory for storing executable instructions;

and the processor is used for realizing the image segmentation method provided by the embodiment of the disclosure when executing the executable instructions stored in the memory.

The embodiment of the disclosure provides a computer-readable storage medium, which stores executable instructions for causing a processor to execute the method for segmenting an image provided by the embodiment of the disclosure.

The embodiment of the disclosure has the following beneficial effects:

the method comprises the steps of obtaining bottom layer image features in an image, and performing at least one of texture enhancement and texture feature statistics on the bottom layer image features, wherein the obtained bottom layer texture features can contain bottom layer texture information of a multi-scale deep layer in an image to be processed, and represent enhanced texture details and/or statistical distribution of the texture features of the image to be processed. Therefore, the bottom layer texture features and the high-level semantic features are combined in the image segmentation, so that the texture detail information of the image to be processed can fully participate in the network inference process of the semantic segmentation, the accuracy of segmentation result information such as the boundary, the texture, the structure and the like of the segmentation region is improved, and the accuracy of the image segmentation is further improved.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating an alternative method for image segmentation according to an embodiment of the present disclosure;

fig. 2(a) is a schematic diagram illustrating an effect of a local structural feature of an image texture in a street view picture 1 according to an embodiment of the disclosure;

fig. 2(b) is a schematic diagram illustrating an effect of performing equalization enhancement on a street view picture 1 according to an embodiment of the disclosure;

fig. 3 is a schematic structural diagram of a STLNet network provided by the embodiment of the present disclosure;

FIG. 4 is an alternative flowchart of an image segmentation method provided by the embodiments of the present disclosure;

fig. 5 is an alternative structural diagram of the STLNet network provided by the embodiment of the present disclosure;

fig. 6 is an alternative structural diagram of the STLNet network provided by the embodiment of the present disclosure;

fig. 7 is an alternative structural diagram of the STLNet network provided by the embodiment of the present disclosure;

FIG. 8 is an alternative flow chart of an image segmentation method provided by the embodiments of the present disclosure;

FIG. 9 is an alternative flow chart of an image segmentation method provided by the embodiments of the present disclosure;

fig. 10 is an alternative structural schematic diagram of a one-dimensional quantization counting module provided in an embodiment of the present disclosure;

fig. 11(a) is an alternative structural schematic diagram of an enhanced texture feature module TEM provided in the embodiment of the present disclosure;

fig. 11(b) is an alternative structural schematic diagram of an enhanced texture feature module TEM provided in the embodiment of the present disclosure;

fig. 12 is an alternative structural schematic diagram of a PTFEM module provided by an embodiment of the present disclosure;

fig. 13 is a schematic diagram illustrating comparison of effects before and after performing texture feature statistical processing on an original image according to an embodiment of the present disclosure;

FIG. 14 is a schematic diagram comparing the image segmentation effect of the present disclosure with the DeepLabV3 neural network image segmentation method provided by the embodiment of the present disclosure;

fig. 15 is a schematic structural diagram of an image segmentation apparatus provided in an embodiment of the present disclosure;

fig. 16 is a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure.

Detailed Description

For the purpose of making the purpose, technical solutions and advantages of the present disclosure clearer, the present disclosure will be described in further detail with reference to the accompanying drawings, the described embodiments should not be construed as limiting the present disclosure, and all other embodiments obtained by a person of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present disclosure.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

In the following description, references to the terms "first \ second \ third" are only to distinguish similar objects and do not denote a particular order, but rather the terms "first \ second \ third" are used to interchange specific orders or sequences, where permissible, so that the disclosed embodiments described herein can be practiced in other than the order shown or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The terminology used herein is for the purpose of describing embodiments of the disclosure only and is not intended to be limiting of the disclosure.

In recent years, due to the rapid development of deep learning, semantic segmentation of images by using a neural network has been successful. Modern semantic segmentation models are mostly based on a complete convolutional network (FCN), and a convolutional layer is used to replace a complete connected layer in a common classification network, so as to obtain a pixel-level prediction result. And, on this basis, various methods for improving the basic FCN have also evolved. Some methods use a pyramid structure to obtain information from different sizes of receptive fields; illustratively, a Pyramid Scene Parsing Network (PSPNet) inputs features into Pyramid merging layers with different merging proportions; the series of semantic segmentation models depllabv 3 propose an atrophic spatial pyramid merging layer consisting of multiple extended convolutions with different extension rates. However, a Convolution Operator (Convolution Operator) in the FCN network is sensitive to local changes, and can use the Convolution Operator to perform mining on image information of local features such as boundaries, but is not sufficient for describing or counting image textures. Accordingly, many methods are currently proposed to extract and utilize low-level features, such as passing low-level features to higher network layers through a hopping connection technique. However, simple multi-level feature addition or stitching operations may lead to feature misalignment issues, thereby diminishing the effectiveness of low-level features. In summary, the current image segmentation technology cannot effectively utilize the texture information of the image, which results in inaccurate boundary and texture information of various objects in the image segmentation result, thereby reducing the accuracy of image segmentation.

The image segmentation method provided by the embodiment of the present disclosure will be described in conjunction with exemplary applications and implementations of the electronic device provided by the embodiment of the present disclosure.

Referring to fig. 1, fig. 1 is an alternative flowchart of an image segmentation method provided by an embodiment of the present disclosure, and will be described with reference to the steps shown in fig. 1.

S101, extracting bottom layer image features from the image to be processed, and performing semantic segmentation on the bottom layer image features to obtain high-level semantic features.

In the embodiment of the disclosure, the electronic device may perform feature extraction on pixels included in the image to be processed by using a feature extraction method to obtain the bottom layer image features.

In some embodiments, the electronic device may perform multi-scale image feature extraction on the image to be processed by using multiple layers of feature extraction layers in the neural network ResNet101, and use features output by lower layers of the multiple layers of feature extraction layers, such as features output by a first layer or a second layer, as the underlying image features. Other types of feature extraction networks or feature extraction methods may also be used to extract the bottom-layer image features from the image to be processed, which are specifically selected according to the actual situation, and the embodiment of the present disclosure is not limited.

Here, the underlying image features represent apparent features of the image to be processed, such as color, boundary division, contrast, brightness, and the like, and the embodiments of the present disclosure are not limited thereto.

In the embodiment of the present disclosure, the electronic device may further perform high-level feature extraction based on the bottom-layer image feature to obtain high-level semantic information of the image to be processed, exemplarily, input the bottom-layer image feature into a high-level extraction layer in the feature extraction network to extract the high-level semantic information, such as whether the image to be processed includes information such as a vehicle, a pedestrian, a road, and the like. And the electronic equipment performs semantic prediction on the semantic category of each pixel according to the extracted high-level semantic information, and performs semantic segmentation on the image to be processed according to the result of the semantic prediction to obtain high-level semantic features.

In some embodiments, the electronic device may semantically segment the underlying image features through a semantic segmentation network. For example, the semantic segmentation network may be a convolution Pooling layer of the backbone network ResNet101, an empty space convolution Pooling Pyramid (ASPP) network, or another type of semantic segmentation network, which is specifically selected according to actual situations, and the embodiment of the present disclosure is not limited.

S102, performing at least one of texture enhancement and texture feature statistics on the bottom layer image features to obtain bottom layer texture features; the underlying texture features are used to characterize enhanced texture details and/or statistical distribution of texture features of the image to be processed.

In the embodiment of the disclosure, the image texture in the digital image processing has not only local structural features but also global statistical features. Here, the local structural characteristics may be local pattern information, such as information of boundaries, smoothness, roughness, and the like, which may be extracted by a lower filter in a Convolutional Neural Network (CNN). As shown in fig. 2(a), fig. 2(a) is an effect schematic diagram of extracting the street view picture 1 through the low-level filter 1 to obtain the local structural feature of the image texture in the street view picture 1. It can be seen that the local structure feature map in fig. 2(a) shows information such as texture, boundary, and object structure in the street view picture 1. The global statistical features may be derived by performing statistics on the image for underlying image information (e.g., pixel values or local region attributes), and in some embodiments, the global statistical features may be derived by computing a gray-scale histogram for characterizing the local structural feature distribution. As shown in fig. 2(b), fig. 2(b) is a schematic diagram of the effect of calculating the global statistical characteristics of the image texture in the street view picture 1 through the gray histogram and performing the equalization enhancement on the street view picture 1 according to the global statistical characteristics. It can be seen that the original street view picture 1 taken in a dark environment is generally of poor visual quality. After the histogram equalization enhancing step, the image segmentation is more delicate and the segmentation effect is better.

Based on the effect diagrams shown in fig. 2(a) and fig. 2(b), it can be seen that adding texture feature information in the image segmentation task can further optimize the boundary and texture information of the segmentation object region, and improve the accuracy of image segmentation.

Generally, due to a low-layer feature extraction layer of a backbone network, the quality of bottom-layer image features extracted from an image to be processed is often low, and particularly, the contrast is low, so that the texture details may be blurred, and the extraction and utilization of bottom-layer information are affected. Therefore, the embodiment of the disclosure can analyze and mine the bottom layer image features, enhance the texture details of the bottom layer image features, thereby realizing the texture enhancement of the bottom layer image features, obtaining the bottom layer texture features representing the enhanced texture details of the image to be processed, so as to capture the image information related to the texture more easily in the following method steps, and improve the accuracy of image segmentation by using the image information related to the texture.

In the embodiment of the disclosure, since the statistical information of the spatial relationship between the image texture and the pixels is highly correlated, in order to describe the spatial distribution condition of the image texture between the pixels, the electronic device may extract the statistical characteristics of the texture information from multiple scales by using the texture information in the bottom layer image characteristics, thereby realizing the texture characteristic statistics of the bottom layer image characteristics and obtaining the bottom layer texture characteristics representing the statistical distribution of the texture characteristics of the image to be processed.

In the embodiment of the present disclosure, the electronic device may obtain the bottom layer texture feature used for representing enhanced texture details of the image to be processed and/or statistical distribution of the texture feature by performing at least one of texture enhancement and texture feature statistics on the bottom layer image feature.

And S103, combining the high-level semantic features with the bottom-level texture features to obtain a segmented image.

In the embodiment of the disclosure, the electronic device may fuse a high-level semantic feature and a bottom-level texture feature obtained by performing semantic segmentation on an image to be processed, analyze each pixel in the image to be processed comprehensively according to the fused image feature, and determine a segmentation result of each pixel in the segmented image by combining a semantic segmentation class to which the pixel belongs and the bottom-level texture feature corresponding to the semantic segmentation class in the analysis process, so as to obtain a segmentation result corresponding to each pixel. The electronic device may further obtain a segmentation image corresponding to the image to be processed based on the segmentation result corresponding to each pixel.

In some embodiments, when the image to be processed passes through the feature extraction layer, the image size is reduced to a preset output size required by the feature extraction layer, and the electronic device performs at least one of semantic segmentation, texture enhancement and texture feature statistics on the basis of the reduced image to be processed, so as to obtain a segmentation result corresponding to each pixel in the reduced image. In this case, before applying the segmentation result corresponding to each pixel to the final segmented image, the electronic device performs upsampling on the reduced image, restores the upsampled image to the size of the original image to be processed, and finally obtains the segmented image with the size consistent with the size of the image to be processed.

It can be understood that, because the electronic device considers the high-level semantic features and the bottom-level texture features when performing the final image segmentation, the boundaries and textures of various image objects in the segmented image can be made to be approximately consistent with the image to be processed, so that the result of the image segmentation is more accurate.

In some embodiments, the present disclosure provides a Statistical texture Learning Network (STLNet) for implementing the image segmentation method provided by the present disclosure. The network structure of the STLNet may be divided into two parts as shown in fig. 3: a backbone network branch 30 and a texture feature extraction module 31. The main network branch may be a ResNet-101 network, or may be any neural network having a semantic segmentation function, which is specifically selected according to an actual situation, and the embodiment of the present disclosure is not limited.

As shown in fig. 3, after the electronic device inputs the image to be processed into the STLNet, the image to be processed is subjected to multi-scale image extraction through the feature extraction layer 301 of the backbone network branch 30. The electronic device combines the low-level features output by the first two layers of the feature extraction layer 301 to serve as bottom-layer image features, b, inputs the high-level features output by the feature extraction layer 301 to the semantic segmentation network 302, and the semantic segmentation network 302 extracts and analyzes the high-level context features based on the high-level features to realize semantic segmentation and obtain the high-level image features. Here, the semantic segmentation network may be an ASPP network. The electronic device inputs the bottom layer image features into the texture feature extraction module 31, and the texture feature extraction module 31 performs at least one of texture enhancement and texture feature statistics on the bottom layer image features to obtain bottom layer texture features. The electronic device further combines the high-level semantic features output by the semantic segmentation network 302 with the bottom-level texture features output by the texture feature extraction module 31, and restores the image size through an up-sampling process to obtain a final segmented image.

It can be understood that, in an implementation of the present disclosure, the electronic device performs at least one of texture enhancement and texture feature statistics on the bottom layer image features of the image to be processed, so as to enhance the texture details of the image to be processed, and mine deep texture information of multiple scales in the image to be processed, so as to obtain bottom layer texture features that can represent the statistical distribution of the enhanced texture details and/or texture features. The electronic equipment combines the bottom texture features and the high-level semantic features to obtain a final segmented image, and can better restore the bottom texture information of the original image in the segmented image, so that the accuracy of image segmentation is improved.

In the embodiment of the present disclosure, referring to fig. 4, fig. 4 is an optional flowchart of an image segmentation method provided in the embodiment of the present disclosure. S102 in fig. 1 may be implemented by any method flow of S201 to S202, or S301 to S302, or S401, and will be described with reference to the steps.

S201, performing texture enhancement processing based on the bottom layer image characteristics to obtain enhanced texture characteristics.

S202, determining bottom layer texture features based on the enhanced texture features.

In the embodiment of the disclosure, the electronic device may perform texture enhancement processing on the bottom layer image feature, and determine the bottom layer texture feature that may be combined with the high-level semantic feature based on the obtained enhanced texture feature.

In some embodiments, S202 in fig. 4 may be implemented by performing S2021 or S2022, which will be described in conjunction with the steps.

S2021, determining the enhanced texture features as bottom texture features.

In an embodiment of the present disclosure, the electronic device may directly use the enhanced texture features as bottom texture features combined with high-level semantic features.

S2022, combining the enhanced textural features with the bottom layer image features to obtain bottom layer textural features.

In the embodiment of the present disclosure, the electronic device may perform feature fusion on an enhanced texture feature obtained by performing texture enhancement processing on the bottom layer image feature and the original bottom layer image feature obtained by feature extraction, so as to obtain the bottom layer texture feature.

Here, combining the enhanced texture features with the underlying image features may further improve the texture enhancement effect of the underlying texture features.

In some embodiments, as shown in fig. 5, the Texture feature processing Module 31 in fig. 3 may be a Texture Enhancement Module (TEM) 31_ 1. The electronic device may directly use the enhanced texture features output by the TEM as the bottom texture features, or may combine the enhanced texture features output by the TEM with the bottom image features as the bottom texture features, as shown by the dotted arrows in fig. 5.

S301, carrying out texture feature statistical processing based on the bottom layer image features to obtain statistical texture features.

S302, determining bottom layer texture features based on the statistical texture features.

In the embodiment of the disclosure, the electronic device may perform texture feature statistical processing on the bottom layer image features, and use the obtained statistical texture features as bottom layer texture features to further combine with high-level semantic features to obtain a segmented image.

In some embodiments, S302 in fig. 4 may be implemented by executing S3021 or S3022, which will be described in conjunction with the steps.

And S3021, determining the statistical texture features as bottom texture features.

In the embodiment of the disclosure, the electronic device may directly use the statistical texture feature as a bottom texture feature combined with a high-level semantic feature.

And S3022, combining the statistical texture features with the bottom layer image features to obtain bottom layer texture features.

In the embodiment of the present disclosure, the electronic device may perform feature fusion on the statistical texture feature obtained by performing texture statistical processing on the bottom layer image feature and the original bottom layer image feature obtained by feature extraction, so as to obtain the bottom layer texture feature.

Here, combining the statistical texture features with the bottom layer image features may further improve the accuracy of the statistical distribution of the texture features represented by the bottom layer texture features.

In some embodiments, as shown in fig. 6, the Texture Feature processing Module 31 in fig. 3 may be a Pyramid Texture Feature Extraction Module (PTFEM) 31_ 2. The electronic device may directly use the statistical texture features output by the PTFEM as the bottom texture features, or may combine the statistical texture features output by the PTFEM with the bottom image features as the bottom texture features, as shown by the dotted arrows in fig. 6.

S401, performing texture enhancement processing and texture feature statistical processing based on the bottom layer image features to obtain bottom layer texture features.

In some embodiments, S401 in fig. 4 may be implemented by executing S4011 or S4012, which will be described in conjunction with the steps.

S4011, performing texture enhancement processing based on the bottom layer image characteristics to obtain enhanced texture characteristics.

In the embodiment of the disclosure, the electronic device performs texture enhancement processing based on the bottom layer image features to obtain enhanced texture features.

S4012, performing texture feature statistical processing at least based on the enhanced texture features to obtain bottom texture features.

In the embodiment of the present disclosure, S4012 may be implemented by executing any one of the method flows of S4012A or S4012B-S4012C, and the description will be made in conjunction with each step.

S4012A, carrying out texture feature statistical processing based on the enhanced texture features to obtain bottom texture features.

In S4012A, the electronic device performs further texture feature statistical processing on the enhanced texture features obtained in the texture enhancement processing procedure to obtain bottom layer texture features.

S4012B, combining the enhanced texture features with the bottom layer image features, and then performing texture feature statistical processing to obtain middle bottom layer texture features.

S4012C, determining the middle bottom layer texture feature as a bottom layer texture feature.

In S4012B-S4012C, the electronic device performs feature fusion on the enhanced texture features and the bottom layer image features, and performs texture feature statistical processing on the combined features to obtain intermediate bottom layer texture features. The electronic device may use the intermediate bottom texture features as bottom texture features.

In some embodiments, after S4012A, S4012D is also included, as follows:

S4012D, combining at least one of the enhanced texture features and the bottom layer image features with the middle bottom layer texture features to obtain bottom layer texture features.

In S4012D, the electronic device performs feature fusion on the intermediate bottom layer feature and at least one of the enhanced texture feature and the bottom layer image feature to obtain a bottom layer texture feature.

In some embodiments, as shown in fig. 7, the output of the texture enhancement module 31_1 is an enhanced texture feature, and the electronic device may directly input the enhanced texture feature into the texture feature statistics module 31_2 by the method of S4012A, and further output the bottom layer texture feature by the texture feature statistics module 31_ 2; or, the electronic device may also combine the enhanced texture features with the bottom layer image features output by the first two layers of the feature extraction layer 301 by the method of S4012B-S4012C, input the combined features into the texture feature statistical module 31_2, perform texture feature statistical processing on the combined features by the texture feature statistical module 31_2, output middle bottom layer texture features, and use the middle bottom layer texture features as bottom layer texture features; or, by the method of S4012D, after the texture feature statistics module 31_2 outputs the middle bottom layer texture feature, the enhanced texture feature output by the texture enhancement module 31_1 and at least one of the bottom layer image features output by the first two layers of the feature extraction layer 301 are feature-fused with the middle bottom layer texture feature to obtain the bottom layer image feature.

It can be understood that, in the embodiment of the present disclosure, when the electronic device performs texture enhancement processing on the bottom layer image feature, information such as contrast, brightness, and the like of the bottom layer image feature may be enhanced, so that the texture information is clearer, and then the bottom layer texture feature obtained by the texture enhancement processing is combined with the high-level semantic feature to perform semantic segmentation, so that information such as a boundary, a structure, a texture, and the like of each segmented region may be accurately embodied in a segmentation result, and accuracy of the semantic segmentation is improved. When the electronic equipment carries out texture feature statistical processing on the bottom image features, the bottom texture features representing the distribution condition of the texture information can be obtained by mining the texture information statistical features, and then the bottom texture features obtained by texture feature statistics are combined with the high-level semantic features to carry out semantic segmentation, so that the correlation of the texture information among all pixels in the image to be processed can be accurately restored, and the accuracy of image segmentation is improved. The electronic equipment can also further improve the accuracy of image segmentation by simultaneously performing line texture enhancement and texture feature statistics on the bottom layer image features and combining the advantages of the two processing modes.

In some embodiments, referring to fig. 8, fig. 8 is an optional flowchart schematic diagram of an image segmentation method provided by the embodiment of the present disclosure, S201 in fig. 4 may be specifically realized by executing S2011 and S2012, and a description will be given with reference to each step corresponding to the processing flow of the texture enhancement module 31_1 in fig. 5 or fig. 7.

And S2011, performing one-dimensional quantization processing on the bottom layer image characteristic to obtain a first coding characteristic and a first initial quantization characteristic matrix.

In the embodiment of the disclosure, the distribution of the bottom-layer image features of the image to be processed in the spectral domain is generally continuous and has a wide variation range, and the extraction and optimization are difficult in the deep neural network. Therefore, the electronic device first quantizes the bottom layer image features once, quantizes the continuous bottom layer image features into discrete feature values, and obtains a first coding feature and a first initial quantization feature matrix respectively.

The first coding feature represents a quantization coding image corresponding to the bottom layer image feature; the first initial quantization feature matrix represents the statistical features of the bottom-layer image after quantization dispersion.

In some embodiments, referring to fig. 9, fig. 9 is an optional flowchart of an image segmentation method provided in the embodiment of the present disclosure, S2011 may be implemented by performing processes of S2011A-S2011D, and the description will be given with reference to the steps.

And S2011A, performing pooling processing on the bottom layer image features to obtain first average features corresponding to the bottom layer image features.

In S2011A, the electronic device calculates an average of the bottom-layer image features through pooling to obtain a first average feature corresponding to the bottom-layer image features.

In the embodiment of the present disclosure, the bottom-layer image feature is a feature map obtained by performing multi-scale feature extraction on the whole to-be-processed image, and for example, the bottom-layer image feature may be represented by a C × H × W three-dimensional feature matrix. Where H and W represent the height and width of the feature matrix, respectively, and C represents the feature dimension of the underlying image features.

In some embodiments, the electronic device may pool the underlying image features through the global pooling layer, averaging the three-dimensional feature matrix into a one-dimensional average feature vector as the first average feature. Wherein the first average characteristic may be a C × 1 × 1 matrix.

S2011B, calculating a first similarity between the bottom layer image feature and the first average feature to obtain a first similarity matrix.

In S2011B, for each position in the feature matrix of the underlying image feature, the electronic device may calculate a first similarity between the underlying feature at each position and the first average feature, to obtain a similarity value at each position, and further obtain a first similarity matrix.

In some embodiments, the electronic device may calculate a cosine similarity between the underlying image feature and the first average feature as the first similarity.

Here, since the similarity value is a one-dimensional value, the first similarity matrix may be a two-dimensional matrix of H × W. In some embodiments, a first similarity of the underlying image feature to the first average feature may be calculated by equation (1) to obtain a first similarity matrix, as follows:

wherein,

A_i,j(i∈[1,W],j∈[1,H])，

in formula (1), A is a C × H × W feature matrix corresponding to the features of the underlying image, and A_i,jThe method comprises the following steps of (1) obtaining bottom layer features at each position in a feature matrix, wherein the value range of i is 1-W, and W is the width of the feature matrix; of jThe value ranges from 1 to H, which is the height of the feature matrix. g is the first average characteristic. The electronic device can use formula 1 to determine A at each position in the feature matrix_i,jPerforming inner product calculation processing with the first average characteristic g to obtain each A_i,jCorresponding S_ijWherein S is_ijRepresenting the corresponding first similarity at position (i, j). The electronic device may further pass S_ijAnd constructing to obtain a first similarity matrix S, wherein S is a H multiplied by W two-dimensional matrix.

And S2011C, performing one-dimensional quantization processing on the first similarity matrix to obtain a first quantization level vector and a first coding feature.

In S2011C, the electronic device may perform one-dimensional quantization processing on the first similarity matrix based on a dimension of a numerical value of the first similarity in the first similarity matrix, determine a first quantization level vector used in the one-dimensional quantization processing, and obtain the first encoding characteristic after the one-dimensional quantization processing is completed.

The first quantization level vector is a vector formed by quantization values corresponding to each level of the first quantization levels in the preset first quantization levels corresponding to the one-dimensional quantization processing.

In some embodiments, S2011C may be implemented by performing S2011C _1 to S2011C _3, which will be described in conjunction with the steps.

S2011C _1, the first similarity matrices are merged in the length and width dimensions to obtain a one-dimensional similarity vector.

In S2011C _1, the electronic device performs matrix transformation on the first similarity matrix, and combines the length dimension and the width dimension of the first similarity matrix to obtain a one-dimensional similarity vector.

In some embodiments, the one-dimensional similarity vector may be a matrix of 1 x HW,

S2011C _2, performing N-level quantization processing on the one-dimensional similarity vector to obtain a first quantization characteristic value; the first quantization characteristic value forms a first quantization progression vector; n is a positive integer greater than or equal to 1.

In S2011C _2, N represents a preset quantization level for quantization processing, and N is a positive integer greater than or equal to 1. When the electronic device performs N-level quantization processing on the one-dimensional similarity vector, a quantization numerical value corresponding to each level of quantization processing is determined and obtained at the same time and serves as a first quantization characteristic value, wherein the first quantization characteristic value forms a first quantization level vector.

In some embodiments, for an N-level quantization process, the first quantized feature value may include: and N sub-quantization feature values, wherein each sub-quantization feature value represents a corresponding quantization value of each level in the N-level quantization. The determination process of the quantization value of S2011C _2 may be performed by formula (2), that is, according to the maximum value and the minimum value in the one-dimensional similarity vector, N-level quantization processing is performed to determine an average quantization value; and obtaining the sub-quantization characteristic value of each level based on the average quantization value and the minimum value, thereby obtaining N sub-quantization characteristic values. The following were used:

in formula (2), max(s) is the maximum value in the one-dimensional similarity vector, min(s) is the minimum value in the one-dimensional similarity vector,

the calculation result of (1) is an average quantization value, N identifies the quantization series of each level, and the value range is 1 to N. L is_nAnd the corresponding sub-quantization characteristic value of each level of quantization series is obtained. The electronic equipment can obtain N sub-quantization characteristic values L through formula (2)_nI.e. the first quantized feature value L_n. The electronic device may further quantize the feature value L by the first quantization_nConstructing a first quantization progression vector L, wherein L ═ L₁,L₂,...,L_n,...L_N]. The electronic device further quantizes the eigenvalues L according to each of the sub-quantized eigenvalues L_nThe quantization processing of each of the N-level quantization processing is performed on the one-dimensional similarity vector.

S2011C _3, determining a first encoding feature based on the first quantized feature value and the one-dimensional similarity vector.

In S2011C _3, the electronic device may determine the first encoding feature based on the first quantized feature value and the one-dimensional similarity vector by equation (3), as follows:

in the formula (3), S'_i’To merge the length and width dimensions of the first similarity matrix S, each first similarity S in S is combined_ijAnd carrying out matrix dimension transformation to obtain a one-dimensional similarity value. Here, i' e [1, HW)]，S’_i’A one-dimensional similarity vector S' can be formed,

the electronic device may use the N sub-quantization feature values L in the first quantization series vector by equation (3)_nFor each one-dimensional similarity value S 'in the one-dimensional similarity vector'_i’Carrying out N-level quantization processing to obtain a first coding feature E of NxHW dimension_i',n。

And S2011D, fusing the first coding feature and the first quantization level vector to obtain a first initial quantization feature matrix.

In S2011D, the electronic device fuses the first coding feature and the first quantization level vector to obtain a first initial quantization feature matrix for characterizing quantized bottom-layer image features.

In some embodiments, the electronic device may perform, by using formula (4), a spatial size averaging process on the first encoding feature to obtain a first encoding average feature; and splicing the first coding average characteristic and the first quantization series vector to obtain a first quantization statistical vector.

In the formula (4), the first and second groups,

for the first coding feature E_i',nThe process of spatial size averaging is performed,

the result of the calculation of (a) is the first encoded average feature.

Indicating that the first coding average feature is spliced with a first quantization level vector L, and C represents a first quantization statistical vector.

The electronic device further performs spatial domain conversion on the first quantized statistical vector C through a formula (5), and fuses the first quantized statistical vector C with the first average feature to obtain a first initial quantized feature matrix. Equation (5) is as follows:

D＝Cat(MLP(C),g) (5)

in the formula (5), the MLP represents a processing procedure of spatial domain conversion, and the electronic device may convert the first quantized statistical vector C into the same feature space of the first average feature g through the formula (5), and perform feature fusion processing to obtain a first initial quantized feature matrix D.

In some embodiments, the electronic device may further upsample the first average feature g prior to using the first average feature g for fusion with the spatial-domain converted first quantized statistical vector C; alternatively, the electronic device may also directly use the first quantized statistical vector C obtained by equation (4) as the first initial quantized feature matrix. The specific choice is made according to the actual situation, and the embodiment of the disclosure is not limited.

In some embodiments, referring to fig. 10, fig. 10 is an optional structural schematic diagram of a one-dimensional Quantization and Counting Operator (1 d-QCO) provided in an embodiment of the present disclosure, and an electronic device may perform one-dimensional Quantization processing on a bottom-layer image feature through the 1d-QCO to obtain a first encoding feature and a first initial Quantization feature matrix. As shown in FIG. 10, the underlying image features A may be pooled by global pooling in the 1d-QCO moduleAnd obtaining the first average characteristic corresponding to the A. The 1d-QCO module calculates cosine similarity between the A and the first average characteristic to obtain a first similarity matrix S, wherein S is a two-dimensional matrix of 1 xHxW, the 1d-QCO module combines length and dimensionality of the first similarity matrix S to convert S into a one-dimensional similarity vector of 1 xHW, and the 1d-QCO module quantizes corresponding to N first quantization characteristic values L according to N levels₁To L_NAnd performing N-level quantization processing on the one-dimensional similarity vector to obtain N multiplied by HW dimensionality first coding feature E, and performing space size average processing on each line in the first coding feature E by a 1d-QCO module to obtain N first coding average features. The 1d-QCO module splices the N first coding average characteristics and the N first quantization characteristic values to obtain a first quantization statistical vector C with dimension of Nx 2. And the 1D-QCO module performs spatial domain conversion on the first quantization statistical vector C and performs feature fusion on the first quantization statistical vector C and the first average feature subjected to upsampling processing to obtain a first initial quantization feature matrix D.

S2012, obtaining the enhanced texture feature based on the first coding feature and the first initial quantization feature matrix.

In the embodiment of the present disclosure, when the electronic device obtains the first coding feature and the first initial quantization feature matrix through one-dimensional quantization processing on the bottom layer image feature, the electronic device may fuse the first coding feature and the first initial quantization feature matrix, for example, perform matrix multiplication or matrix splicing, and use the fused result as the enhanced texture feature.

In some embodiments, referring to fig. 11(a), fig. 11(a) is an alternative structural schematic diagram of an enhanced texture feature module TEM provided by the embodiments of the present disclosure. As shown in fig. 11(a), the TEM may contain a one-dimensional quantitative counting module and a fusion module. Illustratively, the structure of the one-dimensional quantization counting module may be as shown in fig. 10. The TEM may combine the first encoding feature E output by the one-dimensional quantization counting module, i.e. the N × HW matrix in fig. 11(a), and the first initial quantization feature matrix D output by the one-dimensional quantization counting module, i.e. C in fig. 11(a)₁The xN matrix is multiplied by the fusion module to obtain the output of TEM, namely, the enhanced texture characteristic, C₁A matrix of x HW.

In some embodiments of the present disclosure, in order to improve the balance of quantization result distribution, the electronic device may further perform secondary quantization optimization on the first initial quantization feature matrix D obtained by the one-dimensional quantization processing to obtain a first quantization feature matrix D ', and further fuse the first quantization feature matrix D' and the first coding feature to obtain an enhanced texture feature.

In some embodiments, quadratic quantization optimization may be implemented by a method of graph inference, and S2012 may be implemented by performing the processes of S2012A-S2012B, which will be described in connection with the steps.

S2012A, carrying out graph reasoning on the first initial quantization feature matrix to obtain a first quantization feature matrix.

In S2012A, the graph inference process needs to construct nodes and edges between the nodes in the graph model, and then implement the graph inference algorithm based on the graph model. The electronic device may construct the graph model using the initial quantized feature values at each quantization level in the first initial quantized feature matrix as vertices. Illustratively, when the first initial quantization feature matrix is

The vertices of the graph model may be C corresponding to each of N quantization levels₁The value of (c). The adjacent relationship between vertices can be obtained by equation (7) as follows:

X＝Softmax(φ₁(D)^T·φ₂(D)) (7)

in the formula (7), phi₁And phi₂Representing two 1 × 1 convolutional layers, D is the first initial quantization feature matrix, and in some embodiments Softmax is a non-linear normalization function. And X is an adjacent matrix used for representing the adjacent relation between the vertexes of the graph model.

The electronic equipment further fuses the characteristics of all other vertexes to update each initial node through a formula (8), and a reconstructed first quantitative characteristic matrix is obtained

As shown in equation (8):

D'＝φ₃(D)·X (8)

in the formula (8), phi₃Represents phi₁And phi₂And another 1 × 1 convolutional layer.

S2012B, obtaining an enhanced texture feature based on the first quantized feature matrix and the first coding feature.

In S2012B, the electronic device multiplies E by the first quantization feature matrix D' obtained by using the second quantization optimization, so as to obtain the enhanced texture feature R, as shown in formula (9).

R＝D'·E^T (9)

In formula (9), E^TFor the first coding feature E_i',nR is C₂A matrix of x H x W.

In some embodiments, referring to fig. 11(b), fig. 11(b) is an alternative structural schematic diagram of an enhanced texture feature module (TEM)112 provided by embodiments of the present disclosure. As shown in fig. 11(b), the TEM module may further include a graph inference module, and the graph inference module may perform quadratic quantization optimization on the first initial quantization feature matrix D output by the one-dimensional quantization counting module by performing the methods in S2012A-S2012B to obtain a first quantization feature matrix D ', where D' is C₂A matrix of x HW. The enhanced texture feature module 112 may obtain the enhanced texture feature R by fusing D' with the first coding feature HW × N.

It can be understood that, in the embodiment of the present disclosure, the electronic device may extract quantized bottom layer texture information from the bottom layer image features through a one-dimensional quantization process, and further perform enhancement processing on the bottom layer texture information through texture enhancement processing to obtain the bottom layer texture features, so that texture details in the bottom layer texture features are enhanced, and thus, when the electronic device is applied to image segmentation, the accuracy of image segmentation is improved.

In some embodiments, the present disclosure provides a method for performing texture feature statistics processing based on input image features to obtain target statistical texture features, which corresponds to the flow of the texture feature statistics module 31_2 in fig. 6 or fig. 7, and the method includes: the method flow of any one of S501, or S502 to S503, or S504 to S505 branches, and the description will be given with reference to each step.

S501, performing two-dimensional quantization processing on the input image characteristics to obtain target statistical texture characteristics; the target statistical texture features characterize the feature distribution relationship between the pixels.

In the embodiment of the present disclosure, the electronic device performs two-dimensional quantization processing on the input image features of the input texture feature statistical module, quantizes the input image features from the dimensions of the length and the width of the feature space, respectively, to obtain the target statistical texture features, and represents the feature distribution relationship between pixels through the target statistical texture features.

Here, when the bottom layer image feature is the input image feature, the target statistical texture feature is the statistical texture feature; or,

under the condition that the enhanced texture features are input image features, the target statistical texture features are bottom-layer texture features; or,

And S502, carrying out size segmentation on the input image features to obtain local image features corresponding to the input image features.

S503, carrying out two-dimensional quantization processing on the local image features to obtain target statistical texture features.

In S502-S503, the electronic device may perform size segmentation on the input image feature, and segment the input image feature into a plurality of local image features. The electronic device can perform two-dimensional quantization processing on each local image feature in the multiple local image features in the same process to obtain the target statistical texture feature.

S504, two-dimensional quantization processing is respectively carried out on the input image features and the local image features, and global quantization features corresponding to the input image features and local quantization features corresponding to the local image features are obtained.

And S505, performing feature fusion based on the global quantization features and the local quantization features to obtain target statistical texture features.

In S504-S505, the electronic device may perform two-dimensional quantization processing on the input image feature representing the global feature and the local image feature obtained by size segmentation, respectively, to obtain a global quantization feature and a local quantization feature, and further perform feature fusion on the global quantization feature and the local quantization feature, to obtain a target statistical texture feature.

It should be noted that, in the embodiment of the present disclosure, the electronic device may perform two-dimensional quantization processing on the global image feature and the local quantization feature by using the same method.

In some embodiments, S505 may be implemented by executing the processes of S5051-S5053, which will be described in connection with the steps.

S5051, carrying out space conversion on the global quantization characteristic and averaging to obtain a global average characteristic.

S5052, carrying out space conversion on the local quantized features and averaging to obtain local average features.

In S5051 and S5052, the electronic device may perform spatial transformation and averaging on the global quantized feature through the formula (10) and the formula (11) to obtain a global average feature; and carrying out spatial conversion on the local quantized features and averaging to obtain local average features.

F'＝MLP(F) (10)

In formula (10), for S5051, F represents a global quantization feature; for S5052, F represents a local quantization feature, the electronic device may perform MLP spatial domain conversion on the global quantization feature or the local quantization feature through formula (10), and correspondingly obtain a spatially converted global quantization feature or local quantization feature, F'. Wherein,

in formula (11), for S5051, F'_:,m,nThe global quantization characteristic after the space conversion is obtained. Electronic device pair F'_:,m,nAnd averaging to obtain the global average characteristic T. The electronic device may average the spatially converted local quantized features in the same way by equation (11) to obtain a local average feature.

S5053, performing feature fusion on the global average feature and the local average feature to obtain the target statistical texture feature.

In S5053, the electronic device may perform feature fusion on the global average feature and the local average feature to obtain a target statistical texture feature.

In some embodiments, referring to fig. 12, fig. 12 is an alternative structural schematic diagram of a PTFEM module provided in an embodiment of the present disclosure. As shown in fig. 12, the PTFEM module may use the entire bottom layer image feature as a global feature, perform pyramid progressive size segmentation on the bottom layer image feature to obtain a local feature corresponding to each stage of size segmentation, and perform two-dimensional quantization processing on the global feature and each local feature respectively through a two-dimensional quantization counting module in the PTFEM module to obtain a global quantization feature and a local quantization feature. And finally, respectively performing upsampling on the global average feature and the local average feature and then fusing the upsampled global average feature and the local average feature to obtain a target statistical texture feature corresponding to the bottom layer image feature.

In some embodiments, S501 may be implemented by performing the processes of S5011-S5014, which will be described in conjunction with the steps.

S5011, performing pooling processing on the input image features to obtain second average features corresponding to the input image features.

In the embodiment of the present disclosure, a process of performing two-dimensional quantization processing by an electronic device is similar to a process of performing one-dimensional quantization processing, and first, pooling processing is performed on an input image feature to obtain a second average feature corresponding to the input image feature.

S5012, calculating a second similarity of the input image features and the second average features to obtain a second similarity matrix.

In the embodiment of the disclosure, the electronic device calculates a second similarity between the input image feature and the second average feature to obtain a second similarity matrix.

S5013, performing two-dimensional quantization processing on the second similarity matrix to obtain a second quantization progression matrix and a second coding feature.

In the embodiment of the disclosure, for the second similarity matrix, the electronic device may perform two-dimensional quantization processing on the second similarity matrix in the dimensions of length and width, respectively, to obtain a second quantization progression matrix and a second coding feature.

In some embodiments, S5013 may be implemented by performing the processes of S5013A-S5013C, which will be described in connection with the steps.

S5013A, performing M-level quantization processing on two dimensions in the second similarity matrix respectively to obtain a second quantization characteristic value; the second quantization characteristic value forms a second quantization series matrix; m is a positive integer greater than or equal to 1.

In the embodiment of the present disclosure, the electronic device may perform M-level quantization processing on two dimensions, such as length and width, in the second similarity matrix, where specific values of M may be different in length or dimension, and M is a positive integer greater than or equal to 1.

In the embodiment of the disclosure, the electronic device may determine, according to equation (12), M-level quantization levels for quantizing two dimensions in the second similarity matrix

The following were used:

wherein,

and S5013B, determining the intermediate coding feature matrix based on the second quantized feature value and the second similarity matrix.

In S5013, the electronic device may perform quantization processing on each dimension based on the quantization level determined by formula 12 on two dimensions of the second similarity matrix, respectively, to obtain an intermediate coding feature matrix.

S5013C, based on the intermediate coding feature matrix, multiplying the current intermediate coding feature by the transpose of the adjacent intermediate coding feature, and determining a coding feature corresponding to the current intermediate coding feature, thereby obtaining a second coding feature.

At S5013C, the electronic device may use equation (13) to assign the current intermediate encoding feature E in the intermediate encoding feature matrix_i,jIntermediate coding feature E adjacent thereto_i,j+1Is transferred to

Multiplying to determine a coding feature corresponding to the current intermediate coding feature, thereby obtaining a second coding feature

S5014, fusing the second coding feature and the second quantization series matrix to obtain a second initial quantization feature matrix.

In S5014, the electronic device fuses the second coding feature and the second quantization progression matrix to obtain a second initial quantization feature matrix.

In some embodiments, S5014 may be implemented by performing the processes of S5014A-S5014C, which will be described in connection with the steps.

S5014A, performing a spatial size averaging process on the second encoding feature to obtain a second encoding average feature.

S5014B, splicing the second coding average feature and the second quantization level vector to obtain a second quantization statistical matrix.

In the embodiment of the present disclosure, the processes of S5014A and S5014B may be implemented by equation (14), and the description thereof is similar to equation (4), and is not repeated here.

In the formula (14), the reaction mixture,

for the second code average characteristic,

is a second vector of quantized levels, C_tThe calculated second quantization statistic matrix.

S5014C, performing spatial domain conversion on the second quantitative statistical matrix, and fusing the second quantitative statistical matrix with the second average feature to obtain a second initial quantitative feature matrix.

In S5014C, the electronic device may perform spatial domain conversion on the second quantization statistical matrix by a method similar to the one-dimensional quantization processing procedure, and fuse the second quantization statistical matrix with the second average feature to obtain a second initial quantization feature matrix.

In some embodiments, the execution of S5014C may be implemented by equation (15) as follows

D₂＝Cat(MLP(C_t),g₂) (15)

In the formula (15), C_tIs a second quantized statistical matrix, g₂Is a second average characteristic, D₂A second initial quantized feature matrix.

It should be noted that, for the STLNet shown in fig. 7, in the process of training the neural network, equation (16) may be used as a loss function of the network training, and the total loss corresponding to each round of network training is obtained through equation (16), where equation (16) is as follows:

Lo＝Lo_f+α·Lo_a (16)

in the formula (16), Lo_fAnd in order to predict the loss, representing the STLNet and outputting a global loss value corresponding to the image segmentation result. Lo_aTo assist inAnd measuring loss, and representing the auxiliary loss of the bottom layer image features output by the feature extraction layer of the lower layer. α is a preset value for adjusting the weight ratio between the predicted loss and the auxiliary predicted loss, and in some embodiments, α may be 0.4.

It can be seen that the STLNet obtained by the loss value adjustment of equation (16) can improve the precision of semantic segmentation when used in an image segmentation task.

It can be understood that, in the embodiment of the present disclosure, the electronic device may fully mine and extract the distribution and proportion of the bottom texture information in the image to be processed from the bottom image features through a two-dimensional quantization processing process, and use the distribution and proportion as the statistical texture features, so that structural information such as boundaries and textures of the segmentation region in the image segmentation result may be optimized through the statistical texture features, and the accuracy of image segmentation is improved.

The following provides an effect comparison table for comparing the image segmentation method provided by the embodiment of the present disclosure with the existing neural network to perform image segmentation. As shown in tables 1-4. It can be seen that the method provided by the embodiment of the disclosure can effectively improve an Intersection Over Union (IoU) index after semantic segmentation, and greatly improve the accuracy of image segmentation.

TABLE 1

TABLE 2

TABLE 3

TABLE 4

In the embodiment of the present disclosure, referring to fig. 13, fig. 13 is a schematic diagram illustrating comparison of effects before and after performing texture feature statistical processing on an original image according to the embodiment of the present disclosure. As can be seen from fig. 13, after the texture feature statistical processing is performed on the original image, the texture feature of the image is more obvious, which is more helpful to improve the accuracy of image segmentation.

In the embodiment of the disclosure, referring to fig. 14, fig. 14 is a schematic diagram illustrating a comparison between the present disclosure and an image segmentation effect of a deplab v3 neural network image segmentation method provided in the embodiment of the disclosure. It can be seen that, for the same original image, the boundaries of the semantic regions of the segmented image are more consistent with the original image, and the image segmentation effect is more accurate.

The present disclosure further provides an image segmentation apparatus, and fig. 15 is a schematic structural diagram of the image segmentation apparatus provided in the embodiment of the present disclosure; as shown in fig. 15, the image segmentation apparatus 500 includes:

a feature extraction module 501, configured to extract bottom layer image features from an image to be processed;

a semantic segmentation module 502, configured to perform semantic segmentation on the bottom-layer image features to obtain high-layer semantic features;

a texture feature processing module 503, configured to perform at least one of texture enhancement and texture feature statistics on the bottom layer image features to obtain bottom layer texture features; the bottom texture features are used for representing enhanced texture details and/or statistical distribution of texture features of the image to be processed;

and the feature fusion module 504 is configured to combine the high-level semantic features and the bottom-level texture features to obtain a semantic segmentation image.

In some embodiments, the texture feature processing module 502 is further configured to any one of:

In some embodiments, the texture feature processing module 502 is further configured to determine the enhanced texture feature as the bottom texture feature; or combining the enhanced texture features with the bottom layer image features to obtain the bottom layer texture features.

In some embodiments, the texture feature processing module 502 is further configured to determine the statistical texture feature as the bottom texture feature; or combining the statistical texture features with the bottom layer image features to obtain the bottom layer texture features.

In some embodiments, the texture feature processing module 502 is further configured to perform texture enhancement processing based on the bottom-layer image feature to obtain an enhanced texture feature; and performing texture feature statistical processing at least based on the enhanced texture features to obtain the bottom texture features.

The texture feature processing module 502 is further configured to combine the enhanced texture features with the bottom layer image features, and perform texture feature statistical processing to obtain middle bottom layer texture features; and determining the middle bottom layer texture feature as the bottom layer texture feature.

In some embodiments, the texture feature processing module 502 is further configured to combine the enhanced texture feature with the bottom layer image feature, perform texture feature statistical processing to obtain an intermediate bottom layer texture feature, and then combine at least one of the enhanced texture feature and the bottom layer image feature with the intermediate bottom layer texture feature to obtain the bottom layer texture feature.

The texture feature processing module 502 is further configured to perform one-dimensional quantization processing on the bottom-layer image feature to obtain a first coding feature and a first initial quantization feature matrix; and obtaining the enhanced texture feature based on the first coding feature and the first initial quantization feature matrix.

In some embodiments, the texture feature processing module 502 is further configured to perform pooling on the bottom-layer image feature to obtain a first average feature corresponding to the bottom-layer image feature; calculating a first similarity of the bottom layer image characteristic and the first average characteristic to obtain a first similarity matrix; performing one-dimensional quantization processing on the first similarity matrix to obtain a first quantization series vector and the first coding feature; and fusing the first coding feature and the first quantization series vector to obtain the first initial quantization feature matrix.

In some embodiments, the texture feature processing module 502 is further configured to combine the first similarity matrix in the length and width dimensions to obtain a one-dimensional similarity vector; carrying out N-level quantization processing on the one-dimensional similarity vector to obtain a first quantization characteristic value; the first quantized feature values form the first vector of quantization levels; determining the first coding feature based on the first quantized feature value and a one-dimensional similarity vector; n is a positive integer greater than or equal to 1.

In some embodiments, the first quantized feature value comprises: n sub-quantization feature values; the texture feature processing module 502 is further configured to perform N-level quantization processing according to a maximum value and a minimum value in the one-dimensional similarity vector, and determine an average quantization value; and obtaining the sub-quantization characteristic value of each level based on the average quantization value and the minimum value, thereby obtaining the N sub-quantization characteristic values.

In some embodiments, the texture feature processing module 502 is further configured to perform spatial size averaging on the first encoding feature to obtain a first encoding average feature; splicing the first coding average feature and the first quantization series vector to obtain a first quantization statistical vector; and performing spatial domain conversion on the first quantization statistical vector, and fusing the first quantization statistical vector with the first average characteristic to obtain the first initial quantization characteristic matrix.

In some embodiments, the texture feature processing module 502 is further configured to perform graph inference on the first initial quantized feature matrix to obtain a first quantized feature matrix; and obtaining the enhanced texture feature based on the first quantization feature matrix and the first coding feature.

In some embodiments, the texture feature processing module 502 is further configured to, in a case that the underlying image feature is an input image feature, determine a target statistical texture feature as the statistical texture feature; or, when the enhanced texture features are input image features, the target statistical texture features are the bottom layer texture features; or, when the feature obtained by combining the enhanced texture feature and the bottom layer image feature is an input image feature, the target statistical texture feature is the middle bottom layer texture feature.

In some embodiments, the texture feature processing module 502 is further configured to perform two-dimensional quantization processing on the input image feature to obtain the target statistical texture feature; the target statistical texture feature represents the feature distribution relationship among the pixels; or, performing size segmentation on the input image features to obtain local image features corresponding to the input image features; performing two-dimensional quantization processing on the local image features to obtain the target statistical texture features; or, performing two-dimensional quantization processing on the input image features and the local image features respectively to obtain global quantization features corresponding to the input image features and local quantization features corresponding to the local image features; and performing feature fusion based on the global quantization feature and the local quantization feature to obtain the target statistical texture feature.

In some embodiments, the texture feature processing module 502 is further configured to perform pooling on the input image feature to obtain a second average feature corresponding to the input image feature; calculating a second similarity of the input image features and the second average features to obtain a second similarity matrix; performing two-dimensional quantization processing on the second similarity matrix to obtain a second quantization series matrix and a second coding characteristic; and fusing the second coding characteristics and the second quantization series matrix to obtain a second initial quantization characteristic matrix.

In some embodiments, the texture feature processing module 502 is further configured to perform M-level quantization processing on two dimensions of the second similarity matrix, respectively, to obtain a second quantized feature value; the second quantization feature values form the second quantization series matrix; m is a positive integer greater than or equal to 1; determining an intermediate coding feature matrix based on the second quantized feature value and the second similarity matrix; and based on the intermediate coding feature matrix, multiplying the current intermediate coding feature by the transpose of the adjacent intermediate coding feature, and determining a coding feature corresponding to the current intermediate coding feature, thereby obtaining a second coding feature.

In some embodiments, the texture feature processing module 502 is further configured to perform spatial size averaging on the second encoding feature to obtain a second encoding average feature; splicing the second coding average characteristic and the second quantization series vector to obtain a second quantization statistical matrix; and performing space domain conversion on the second quantization statistical matrix, and fusing the second quantization statistical matrix with the second average characteristic to obtain a second initial quantization characteristic matrix.

In some embodiments, the texture feature processing module 502 is further configured to perform spatial transformation on the global quantized features and perform averaging to obtain global average features; carrying out space conversion on the local quantization characteristics and averaging to obtain local average characteristics; and performing feature fusion on the global average feature and the local average feature to obtain the target statistical texture feature.

It should be noted that the above description of the embodiment of the apparatus, similar to the above description of the embodiment of the method, has similar beneficial effects as the embodiment of the method. For technical details not disclosed in the embodiments of the apparatus of the present disclosure, reference is made to the description of the embodiments of the method of the present disclosure.

An embodiment of the present disclosure further provides an electronic device, fig. 16 is a schematic structural diagram of the electronic device provided in the embodiment of the present disclosure, and as shown in fig. 16, the electronic device 2 includes: a memory 21 and a processor 22, wherein the memory 21 and the processor 22 are connected by a communication bus 23; a memory 21 for storing an executable computer program; the processor 22 is configured to implement the image segmentation method provided by the embodiment of the present disclosure when executing the executable computer program stored in the memory 21.

The embodiment of the present disclosure provides a computer-readable storage medium, which stores a computer program for causing the processor 22 to execute the image segmentation method provided by the embodiment of the present disclosure.

In some embodiments of the present disclosure, the storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments of the disclosure, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts, or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EP ROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (H TML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

In summary, in the embodiments of the present disclosure, in an implementation of the present disclosure, the electronic device performs at least one of texture enhancement and texture feature statistics on a bottom-layer image feature of the image to be processed, so as to enhance texture details of the image to be processed, and dig out multi-scale deep texture information in the image to be processed, so as to obtain a bottom-layer texture feature that can represent statistical distribution of enhanced texture details and/or texture features. The electronic equipment combines the bottom texture features and the high-level semantic features to obtain a final segmented image, and can better restore the bottom texture information of the original image in the segmented image, so that the accuracy of image segmentation is improved. Moreover, the electronic equipment can extract quantized bottom layer texture information from the bottom layer image features through a one-dimensional quantization processing process, and further enhance the bottom layer texture information through texture enhancement processing to obtain the bottom layer texture features, so that texture details in the bottom layer texture features are enhanced, and the method is favorable for improving the accuracy of image segmentation when applied to image segmentation.

The above description is only an example of the present disclosure, and is not intended to limit the scope of the present disclosure. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present disclosure are included in the protection scope of the present disclosure.

Claims

1. An image segmentation method, comprising:

2. The method according to claim 1, wherein the at least one of texture enhancement and texture feature statistics processing on the bottom layer image feature to obtain a bottom layer texture feature comprises any one of:

3. The method of claim 2, wherein determining the underlying texture feature based on the enhanced texture feature comprises:

determining the enhanced texture features as the bottom texture features; or,

4. The method of claim 2, wherein the determining the underlying texture features is based on the statistical texture features; the method comprises the following steps:

5. The method of claim 2, wherein the performing texture enhancement processing and texture feature statistics processing based on the bottom layer image features to obtain the bottom layer texture features comprises:

6. The method of claim 5, wherein said performing texture feature statistics based on at least the enhanced texture features to obtain the bottom texture features comprises:

7. The method of claim 6, wherein after combining the enhanced texture features with the bottom layer image features and performing texture feature statistical processing to obtain intermediate bottom layer texture features, the method further comprises:

8. The method according to any one of claims 2, 3 and 5 to 7, wherein the performing texture enhancement processing based on the bottom layer image feature to obtain an enhanced texture feature comprises:

9. The method of claim 8, wherein the performing one-dimensional quantization on the base layer image feature to obtain a first coding feature and a first initial quantization feature matrix comprises:

10. The method of claim 9, wherein the performing a one-dimensional quantization process on the first similarity matrix to obtain a first quantization level vector and a first coding feature comprises:

11. The method of claim 10, wherein the first quantized feature value comprises: n sub-quantization feature values; the performing N-level quantization processing on the one-dimensional similarity vector to obtain a first quantization feature value includes:

12. The method according to any of claims 9 to 11, wherein said fusing the first encoded feature with the first quantization level vector to obtain the first initial quantization feature matrix comprises:

13. The method according to any of claims 8 to 12, wherein said deriving the enhanced texture feature based on the first coding feature and the first initial quantization feature matrix comprises:

14. The method of claim 6,

under the condition that the bottom layer image features are input image features, the target statistical texture features are the statistical texture features; or,

15. The method of claim 14, wherein performing texture feature statistical processing based on input image features to obtain target statistical texture features comprises:

respectively carrying out two-dimensional quantization processing on the input image features and the local image features to obtain global quantization features corresponding to the input image features and local quantization features corresponding to the local image features; and performing feature fusion based on the global quantization feature and the local quantization feature to obtain the target statistical texture feature.

16. The method according to claim 15, wherein the performing two-dimensional quantization processing on the input image feature to obtain the target statistical texture feature comprises:

17. The method of claim 16, wherein the two-dimensional quantization processing of the second similarity matrix to obtain a second quantization series matrix and a second coding feature comprises:

18. The method of claim 16 or 17, wherein said fusing the second coding feature with the second quantization series matrix to obtain a second initial quantization feature matrix comprises:

19. The method according to claim 15, wherein the performing feature fusion based on the global quantized feature and the local quantized feature to obtain the target statistical texture feature comprises:

20. An image segmentation apparatus, comprising:

21. An electronic device, comprising:

a memory for storing executable instructions;

a processor for implementing the method of any one of claims 1 to 19 when executing executable instructions stored in the memory.

22. A computer-readable storage medium having stored thereon executable instructions for, when executed by a processor, implementing the method of any one of claims 1 to 19.