CN112016592A

CN112016592A - Domain adaptive semantic segmentation method and device based on cross domain category perception

Info

Publication number: CN112016592A
Application number: CN202010773728.3A
Authority: CN
Inventors: 李仕仁; 王金桥; 朱贵波; 胡建国; 张海; 赵朝阳; 林格; 谭大伦
Original assignee: Nexwise Intelligence China Ltd
Current assignee: Nexwise Intelligence China Ltd
Priority date: 2020-08-04
Filing date: 2020-08-04
Publication date: 2020-12-01
Anticipated expiration: 2040-08-04
Also published as: CN112016592B

Abstract

The embodiment of the invention provides a domain adaptive semantic segmentation method and a device based on cross domain category perception, wherein the method comprises the following steps: after the source image style is converted into a target image style, respectively carrying out feature extraction and classification; inputting the feature map and the classification score map into a cross domain category perception module; adjusting the feature map class centers to make the class centers of the feature maps close through cross domain class center generators of the two cross domain class perceptrons respectively; and respectively adjusting the classification fuzzy feature points of the feature map through a classification attention module to obtain a first attention feature map and a second attention feature map so as to perform semantic segmentation. According to the method and the device, when the characteristics of one field are extracted, the class center of the data characteristics of the other field is concerned by the model, the characteristic of the fuzzy pixel points in the two fields is mainly adjusted by combining the attention mechanism, so that the class centers of the same kind of characteristics of different fields are consistent, the difference of characteristic distribution is reduced, and the field adaptation is realized.

Description

Domain adaptive semantic segmentation method and device based on cross domain category perception

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a method and a device for field adaptive semantic segmentation based on cross field category perception.

Background

Labeling of semantically segmented data labels requires a great deal of manual effort. Therefore, a real dataset for semantic segmentation typically contains only a small number of samples, but this inhibits generalization of the model to various real cases. A common solution is an unsupervised semantic segmentation method, i.e. a model trained on a computer-synthesized dataset is used for datasets of real scenes of the same category. In order to reduce damage to actual feature information, a domain adaptation method is required to reduce the difference of the feature space distribution of the data set images in different domains. Traditional domain adaptation methods generally consider what way to migrate the knowledge of the computer synthesis domain to the real scene, and thus achieve domain adaptation, without paying attention to what knowledge of the computer is migrated, i.e., in short, only paying attention to "how to adapt" and not "what to adapt with".

There is some similarity in the image content of different domains, for example, the categories inside the pictures are roughly the same. Therefore, the feature spaces of the same class of different domain datasets extracted with the same model should be similar, and the class centers should also be consistent. However, there is often a difference in the distribution of features in the same category of datasets for real scenes and for computer-generated scenes. Therefore, how to achieve domain adaptation by reducing the difference of feature distribution in different domains is a problem to be solved urgently.

Disclosure of Invention

In order to solve the problems in the prior art, embodiments of the present invention provide a domain adaptive semantic segmentation method and apparatus based on cross domain category perception.

In a first aspect, an embodiment of the present invention provides a domain adaptive semantic segmentation method based on cross domain category perception, where the method includes: converting the style of a source image in a source data set into the style of a target image in a target data set through a style migration network to obtain a source adaptive image; wherein the source adapted image and the source image have tag data that are consistent; processing the source adaptive image by a first feature extraction network and a first classifier in sequence to obtain a first feature map and a first classification score map; processing the target image by a second feature extraction network and a second classifier in sequence to obtain a second feature map and a second classification score map; inputting the first feature map, the first classification score map, the second feature map, and the second classification score map to a cross-domain category awareness module; the cross domain category perception module comprises two cross domain category perceptrons, each cross domain category perceptron comprises a cross domain category center generator and a category attention module which are sequentially connected, and the category centers of the first feature map and the second feature map are adjusted through the cross domain category center generators of the two cross domain category perceptrons respectively, so that the category centers of the first feature map and the second feature map are close to each other; respectively carrying out distribution adjustment on the classification fuzzy feature points of the first feature map and the second feature map through the classification attention module to respectively obtain a first attention feature map and a second attention feature map; and performing semantic segmentation on the source image according to the first attention feature map and performing semantic segmentation on the target image according to the second attention feature map.

Further, the adjusting the category centers of the first feature map and the second feature map by the cross domain category center generators of the two cross domain category perceptrons respectively specifically includes: performing inner product operation on the first classification score graph and the second feature graph to obtain a classification center of the adjusted first feature graph; and performing inner product operation on the second classification score chart and the first feature chart to obtain the adjusted classification center of the second feature chart.

Further, the adjusted class center of the first feature map is represented as:

the adjusted class center of the second feature map is represented as:

wherein the content of the first and second substances,

representing the class center of the ith class of the source data, H representing the feature height, W representing the feature width, j representing the pixel sequence number, G_c1(F₁) Represents the first classification score map, [ G ]_c1(F₁)]^i,jWhether the jth pixel in the first classification score map belongs to the ith class is represented, and whether the jth pixel is 1 or 0 is not represented; [ A ]₂]^jRepresenting the characteristic distribution of the jth pixel in the second characteristic diagram;

the class center, G, representing the ith class of the target data_c2(F₂) Represents the second classification score map [ G ]_c2(F₂)]^i,jWhether the jth pixel in the second classification score map belongs to the ith class is represented, and whether the jth pixel is 1 or 0 is not represented; [ A ]₁]^jRepresenting the characteristic distribution of the jth pixel in the first characteristic diagram.

Further, the respectively obtaining a first attention feature map and a second attention feature map by performing distribution adjustment on the classification fuzzy feature points of the first feature map and the second feature map through the classification attention module specifically includes: taking the first classification score map as an attention map, and performing inner product operation on the first classification score map and the class center adjusted by the source data to obtain a first class attention feature; channel addition is carried out on the first class attention feature and the first feature map to obtain the first attention feature map; taking the second classification score map as an attention map, and performing inner product operation on the second classification score map and the class center adjusted by the target data to obtain a second class attention feature; and performing channel addition on the attention feature of the second category and the second feature map to obtain the second attention feature map.

Further, the first attention feature map is represented as:

wherein the content of the first and second substances,

said first attention feature map, C, representing the jth pixel of the kth channel of said source image₁Representing the number of categories of the source image, i representing a category number, G_c1(F₁) Represents the first classification score map, [ G ]_c1(F₁)]^i,jWhether the jth pixel in the first classification score map belongs to the ith class is represented, and whether the jth pixel is 1 or 0 is not represented;

representing the class center of the jth pixel of the kth channel of the source image;

the second attention feature map is represented as:

wherein the content of the first and second substances,

said second attention feature map, C, representing the jth pixel of the kth channel of said target image₂Indicating the number of categories of the target image, i indicating a category number, G_c2(F₂) Represents the second classification score map [ G ]_c2(F₂)]^i，jWhether the jth pixel in the second classification score map belongs to the ith class is represented, and whether the jth pixel is 1 or 0 is not represented;

the class center representing the jth pixel of the kth channel of the target image.

Further, the method further comprises: fine-tuning the first and second attention feature maps using a 1 × 1 convolutional layer.

Further, before the sequentially processing the source adapted image through the first feature extraction network and the first classifier, the method further comprises: performing channel compression on the source adapted image; before the target image is processed by the second feature extraction network and the second classifier in sequence, the method further comprises: and carrying out channel compression on the target image.

In a second aspect, an embodiment of the present invention provides a domain adaptive semantic segmentation apparatus based on cross domain category perception, where the apparatus includes: a pre-processing module to: converting the style of a source image in a source data set into the style of a target image in a target data set through a style migration network to obtain a source adaptive image; wherein the source adapted image and the source image have tag data that are consistent; a feature classification module to: processing the source adaptive image by a first feature extraction network and a first classifier in sequence to obtain a first feature map and a first classification score map; processing the target image by a second feature extraction network and a second classifier in sequence to obtain a second feature map and a second classification score map; a feature map adjustment module to: inputting the first feature map, the first classification score map, the second feature map, and the second classification score map to a cross-domain category awareness module; the cross domain category perception module comprises two cross domain category perceptrons, each cross domain category perceptron comprises a cross domain category center generator and a category attention module which are sequentially connected, and the category centers of the first feature map and the second feature map are adjusted through the cross domain category center generators of the two cross domain category perceptrons respectively, so that the category centers of the first feature map and the second feature map are close to each other; respectively carrying out distribution adjustment on the classification fuzzy feature points of the first feature map and the second feature map through the classification attention module to respectively obtain a first attention feature map and a second attention feature map; a semantic segmentation module to: and performing semantic segmentation on the source image according to the first attention feature map and performing semantic segmentation on the target image according to the second attention feature map.

In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the method provided in the first aspect when executing the computer program.

In a fourth aspect, an embodiment of the present invention provides a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the method as provided in the first aspect.

According to the method and the device for segmenting the domain adaptive semantics based on the cross domain class perception, provided by the embodiment of the invention, the cross domain class perception module comprising the cross domain class center generator and the class attention module is arranged, so that when a model extracts the characteristics of one domain, the class center of the data characteristics of the other domain is concerned, and the attention mechanism is combined, so that the fuzzy classification pixel point characteristics in the two domains are mainly adjusted, the class centers of the same type characteristics in the different domains are consistent, the difference of characteristic distribution is reduced, and the domain adaptation is realized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flowchart of a domain adaptive semantic segmentation method based on cross-domain class perception according to an embodiment of the present invention;

FIG. 2 is a schematic frame diagram of a domain adaptive semantic segmentation method based on cross-domain class perception according to an embodiment of the present invention;

FIG. 3 is a schematic processing flow diagram of a domain adaptive semantic segmentation method based on cross-domain class perception according to an embodiment of the present invention;

FIG. 4 is a schematic processing diagram of a cross-domain category perception module in the cross-domain category perception-based domain adaptive semantic segmentation method according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a domain adaptive semantic segmentation apparatus based on cross-domain category perception according to an embodiment of the present invention;

fig. 6 illustrates a physical structure diagram of an electronic device.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a flowchart of a domain adaptive semantic segmentation method based on cross-domain class perception according to an embodiment of the present invention. As shown in fig. 1, the method includes:

step 101, converting the style of a source image in a source data set into the style of a target image in a target data set through a style migration network to obtain a source adaptive image; wherein the source adapted image and the source image have tag data that are consistent.

Semantic segmentation is a typical computer vision problem that involves taking some raw data (e.g., flat images) as input and converting them into masks with highlighted regions of interest. Many people use the term full-pixel semantic segmentation (full-pixel semantic segmentation), in which each pixel in an image is assigned a category ID according to the object of interest to which it belongs.

The domain adaptive semantic segmentation method based on cross domain class perception (also called a domain adaptive semantic segmentation method based on cross domain class perception counterstudy) provided by the embodiment of the invention is mainly used for the domain adaptation of a model between a computer synthesis data set and a real scene data set, and aims to solve the segmentation problem of a target data set under the condition of no label data.

In the embodiment of the invention, the source data set is a data set formed by images with labels, wherein the images are called source images; the target data set is a data set composed of images without labels, wherein the images are referred to as target images. On one hand, the embodiment of the invention can realize accurate semantic segmentation of the data set without labels by using the data set with labels, and can also transfer the knowledge learned from the target data set into model training of the source data set, so that the class centers of the source data set and the target data set are close to each other, and more accurate semantic segmentation is realized on the images in the source data set and the target data set.

In the embodiment of the invention, the processing processes of the source image in the source data set and the target image in the target data set are similar, so that the source image and the target image need to be in a uniform style in order to realize the proximity of the feature image category centers of the source image and the target image. Moreover, one of the purposes is to train a semantic segmentation model of a target image by using a source image, so that the style of a source image in a source data set is converted into the style of the target image in a target data set by a style migration network to obtain a source adaptive image, and the style conversion by using the style migration network can adopt the prior art; wherein the source adapted image and the source image have tag data that are consistent.

Step 102, processing the source adaptive image by a first feature extraction network and a first classifier in sequence to obtain a first feature map and a first classification score map; and processing the target image by a second feature extraction network and a second classifier in sequence to obtain a second feature map and a second classification score map.

After unifying the style, performing feature extraction on the source adaptive image through a first feature extraction network to obtain a first feature map; and then, inputting the first feature map into a first classifier for processing to obtain a first classification score map. The first classification score map contains classification scores for individual pixels of the source image. Performing feature extraction on the target image through a second feature extraction network to obtain a second feature map; and then, inputting the second feature map into a second classifier for processing to obtain a second classification score map. The second classification score map includes classification scores for respective pixels of the target image.

Step 103, inputting the first feature map, the first classification score map, the second feature map and the second classification score map into a cross domain category perception module; the cross domain category perception module comprises two cross domain category perceptrons, each cross domain category perceptron comprises a cross domain category center generator and a category attention module which are sequentially connected, and the category centers of the first feature map and the second feature map are adjusted through the cross domain category center generators of the two cross domain category perceptrons respectively, so that the category centers of the first feature map and the second feature map are close to each other; and respectively carrying out distribution adjustment on the classified fuzzy feature points of the first feature map and the second feature map through the category attention module to respectively obtain a first attention feature map and a second attention feature map.

The method provided by the embodiment of the invention trains the model by adopting the source data set with the label and the target data set without the label. Because the source data set has label data, a model which can be used for classifying the characteristics of the source data can be trained, and because the classification characteristics of the target data of the same kind are different from the classification characteristics of the source data, the model trained by the source data has poor characteristic classification capability on the target data. Therefore, the embodiment of the present invention provides a cross-domain category-aware module (cross-domain category-aware module), which enables the category center of target data of the same category to be approximately the same as the category center of source data, so that a model learned from supervised source data is suitable for classification of target data. The feature distributions of the same kind in different fields have some differences, so that the embodiment of the invention enables the model to cross sense the feature distribution of the other field when extracting the features, thereby enabling the class centers of the different fields to be close to each other and finally enabling the feature distributions of the same class to be consistent.

Specifically, the first feature map, the first classification score map, the second feature map and the second classification score map are input to a cross-domain category awareness module; the cross domain category perception module comprises two cross domain category perceptrons, each cross domain category perceptron comprises a cross domain category center generator and a category attention module which are sequentially connected, and the category centers of the first feature map and the second feature map are adjusted through the cross domain category center generators of the two cross domain category perceptrons respectively, so that the category centers of the first feature map and the second feature map are close to each other; the adjusting of the category centers of the first feature map and the second feature map by the cross domain category center generators of the two cross domain category sensors respectively means that the category center of the first feature map is adjusted by one of the cross domain category center generators of the two cross domain category sensors, and the category center of the second feature map is adjusted by the other one of the cross domain category center generators of the two cross domain category sensors. In the category score map, the category scores of some pixels are closer, that is, the uncertainty of which category the pixels should be classified into by the classifier is higher, and thus the classification error is more likely to be caused. These points, which are more likely to be misclassified, need to be given more attention and should be heavily adjusted in the following process. And respectively carrying out distribution adjustment on the classification fuzzy feature points of the first feature map and the second feature map through the classification attention module according to the adjusted class center by utilizing an attention mechanism to respectively obtain a first attention feature map and a second attention feature map. The respectively performing distribution adjustment on the classification fuzzy feature points of the first feature map and the second feature map through the classification attention module means performing distribution adjustment on the classification fuzzy feature points of the first feature map through one of the two cross-domain classification sensors, and performing distribution adjustment on the classification fuzzy feature points of the second feature map through the other one of the two cross-domain classification sensors.

And 104, performing semantic segmentation on the source image according to the first attention feature map and performing semantic segmentation on the target image according to the second attention feature map.

And performing semantic segmentation on the source image and the target image according to the first attention feature map and the second attention feature map respectively. The first attention feature map and the second attention feature map are still feature maps in nature, and a semantic segmentation method according to the feature maps in the prior art can be adopted to perform semantic segmentation according to the first attention feature map to obtain a source image segmentation result, and perform semantic segmentation according to the second attention feature map to obtain a target image segmentation result.

According to the embodiment of the invention, the cross domain category perception module comprising the cross domain category center generator and the category attention module is arranged, so that when the model extracts the features of one field, the category center of the data features of the other field is concerned, and the attention mechanism is combined to mainly adjust the fuzzy pixel point features in the two fields, so that the category centers of the same kind of features in different fields are consistent, the difference of feature distribution is reduced, and the field adaptation is realized.

Fig. 2 is a schematic frame diagram of a domain adaptive semantic segmentation method based on cross-domain class perception according to an embodiment of the present invention. As shown in fig. 2, the style of the source data set image is first converted to the style of the target data set image by the style migration network. And the label data of the image after the style migration is consistent with the label data of the source image, and the image is called a source adaptive image. The resulting source adapted image a is then used_s→tInput to the feature extraction network G_f1(first feature extraction network) to obtain a first feature map F_s→t(indicated by F in FIGS. 3 and 4)₁Show), and then through classifier G_c1(first classifier) obtaining a first classification score map G_c1(F_s→t) (indicated by G in FIGS. 3 and 4)_c1(F₁) Represents); the target image a_tInput to the feature extraction network G_f2(second feature extraction network) to obtain a second feature map F_t(indicated by F in FIGS. 3 and 4)₂Show), and then through classifier G_c2(second classifier) obtaining a second classification score map G_c2(F_t) (indicated by G in FIGS. 3 and 4)_c2(F₂) Representation). And respectively inputting the finally obtained feature map and the classification score map into the constructed cross domain category perception module CDCAM. Inputting the first feature map, the first classification score map, the second feature map and the second classification score map into a cross domain type perception module to respectively obtain a first attention feature map Z_s→t(using Z in FIGS. 3 and 4)₁Representation) and a second attention profile Z_t(by Z in FIG. 3)₂Representation).

The cross domain category perception module has the main task of mutually perceiving according to the category score graphs of the source data and the target data and the characteristics extracted by the opposite side, and promoting the mutual adaptation of category centers of the two domains. Specifically, the module can make the feature of the two domains extracted by the model sense the class center of the data feature of the other domain, so that the class centers of the two data domain features are close to each other. It can be seen that the embodiment of the present invention also migrates the knowledge learned from the target data set to the model training of the source data set, so that the model can pay attention to the class distribution of the target data when extracting the source data features, thereby improving the robustness of the model. And finally, inputting the source data set image characteristics and the target data set image characteristics processed by the cross domain category sensing module into a discriminator D for discrimination. The discriminator is used for discriminating the classification rationality of the source data set image characteristics and the target data set image characteristics processed by the cross domain category sensing module. The feature extraction network and the cross domain category perception module serve as generators, and the generated feature maps need to be consistent in spatial distribution, so that the discriminator cannot identify the difference between the two. Of course, the arbiter is not a necessary module.

Fig. 3 is a processing flow diagram of a domain adaptive semantic segmentation method based on cross-domain class perception according to an embodiment of the present invention. Fig. 4 is a schematic processing procedure diagram of a cross-domain category sensing module in the cross-domain category sensing-based domain adaptive semantic segmentation method according to an embodiment of the present invention. As shown in FIG. 3, the upper and lower features represent the feature F output by the feature extraction network and the class score map G output by the classifier in the two data fields respectively_c(F) In that respect To reduce the amount of computation, the features may be first channel compressed, which may be performed using 1 × 1 convolutional layers. And then inputting the compressed features and the score map of the same field and the features of the sensed field into a cross domain class sensor (cross domain class aware Block, CDCAB for short). It can be seen from fig. 3 that the outputs of the two domains of the CDCAM module need to focus on the characteristic information of the data of the other domain, which is the origin of the names of the cross-domain class sensing modules. While the CDCAB is mainly composed of two parts, a cross domain Class Center generator (cross domain Class Center Block) and a Class attention Block (Class attention Block), which can be represented by GCDCCB () and GCAB () functions, respectively, as shown in fig. 3. The outputs of the CDCAB modules of the last two domains can be represented by the following equations, respectively.

In fig. 3, N, H, W represents the number of channels, feature height, and feature width of the feature map, respectively, a₁Represents a pair F₁Feature maps after channel compression, A₂Represents a pair F₂And (5) carrying out a characteristic diagram after channel compression. C represents the number of categories. In fig. 4, N' represents the number of compressed channels.

Further, based on the above embodiment, the adjusting the category centers of the first feature map and the second feature map by the cross domain category center generators of the two cross domain category perceptrons respectively specifically includes: performing inner product operation on the first classification score graph and the second feature graph to obtain a classification center of the adjusted first feature graph; and performing inner product operation on the second classification score chart and the first feature chart to obtain the adjusted classification center of the second feature chart.

As shown in fig. 3, an inner product operation is performed on the first classification score map and the second feature map to obtain the adjusted class center of the first feature map. Similarly, an inner product operation is performed on the second classification score map and the first feature map to obtain an adjusted class center of the second feature map (not shown in fig. 3).

Among the plurality of classes in the semantic segmentation final prediction result graph, the class center of the ith class can be represented by the following formula:

wherein F_j∈R^C×H×WFeature map, y, representing the jth pixel_j∈R^1×HWIs the true prediction result, [ y_j＝i]The method determines whether the true prediction result of the jth pixel is of the ith class, if so, the value is 1, and if not, the value is 0. Therefore, the temperature of the molten metal is controlled,the cross domain class sensing module adjusts the class center of the current domain feature according to the formula and the feature information of the sensed domain, so that the class center can be close to the class center of the sensed domain.

Since the label information is not directly provided in the target data set, but the feature information of another domain is intended to adjust the class center of the feature of the current domain, the initial classification score map of the perceptual domain will be used as the coarse label information, F_jAnd taking the feature map of the pixel points in the feature map compressed by the sensing field. And, the source data set and the target data set are actually perceived as each other, so the following expression is given:

the adjusted class center of the first feature map is represented as:

the adjusted class center of the second feature map is represented as:

wherein the content of the first and second substances,

the class center, G, representing the ith class of the target data_c2(F₂) Representing the second classification scoreFIG. G_c2(F₂)]^i,jWhether the jth pixel in the second classification score map belongs to the ith class is represented, and whether the jth pixel is 1 or 0 is not represented; [ A ]₁]^jRepresenting the characteristic distribution of the jth pixel in the first characteristic diagram. G_c1(F₁)∈R^c1×HW，G_c2(F₂)∈R^c2×HW，A₁、A₂∈R^HW×N′Center of each category

The construction of the cross-domain class-centric generator has two advantages: first, a feature map of the data of the other field is added to the module, so that each feature center can understand the global information of the data of the other field. Secondly, the category center obtained by the module can coordinate the consistency between each pixel point and the category information, so that the category center of the current field can be finely adjusted through the operation, and the obtained category center is more compatible with the category center of another field. After cross domain perception, some characteristic points which are more classified and fuzzy are more separated, so that the method is more discriminative; meanwhile, the centers of the same category in different fields are closer, so that the segmentation of the images in different fields can be completed by the same model.

Further, based on the above embodiment, the respectively obtaining a first attention feature map and a second attention feature map by performing distribution adjustment on the classification fuzzy feature points of the first feature map and the second feature map through the classification attention module specifically includes: taking the first classification score map as an attention map, and performing inner product operation on the first classification score map and the class center adjusted by the source data to obtain a first class attention feature; channel addition is carried out on the first class attention feature and the first feature map to obtain the first attention feature map; taking the second classification score map as an attention map, and performing inner product operation on the second classification score map and the class center adjusted by the target data to obtain a second class attention feature; and performing channel addition on the attention feature of the second category and the second feature map to obtain the second attention feature map.

In different fields, the feature distribution of some categories is similar, so that not all feature points in different fields need to be adjusted. But rather, the fuzzy characteristic points of the belonged categories are emphasized and adjusted, so that the classified categories can be clearly classified. Inspired by the Attention mechanism, a Class Attention Block (Class Attention Block) is constructed in the embodiment of the present invention. In the final class score map in the current domain, the class scores of some pixels are closer, that is, the uncertainty of the classifier on which class the pixels should be classified is higher, and thus the classification error is more likely to result. These points, which are more likely to be misclassified, need to be given more attention and should be heavily adjusted in the following process. Therefore, with the idea of the attention mechanism, the class score map of the current domain is used as an attention map to obtain the class attention feature according to the adjusted class center, and then the class attention feature is channel-added with the input.

As shown in fig. 3, the respectively obtaining a first attention feature map and a second attention feature map by performing distribution adjustment on the classification fuzzy feature points of the first feature map and the second feature map through the classification attention module specifically includes: taking the first classification score map as an attention map, and performing inner product operation on the first classification score map and the class center adjusted by the source data to obtain a first class attention feature; channel addition is carried out on the first class attention feature and the first feature map to obtain the first attention feature map; taking the second classification score map as an attention map, and performing inner product operation on the second classification score map and the class center adjusted by the target data to obtain a second class attention feature; and performing channel addition on the attention feature of the second category and the second feature map to obtain the second attention feature map.

In the above description of the embodiments, cross-reference has been made toFork field class center F_class∈R^c×NC represents the number of categories, and a category score map G_F∈R^C×H×WFirstly, the category score map of the current domain is deformed, so that the dimension of the category score map is changed into C × HW. Finally obtaining a category attention feature map Z_a∈R^N′×HWThen deforming it again to obtain Z_a∈R^N′×H×W. Wherein:

the first attention feature map is represented as:

wherein the content of the first and second substances,

the second attention feature map is represented as:

wherein the content of the first and second substances,

said second attention feature map, C, representing the jth pixel of the kth channel of said target image₂Indicating the number of categories of the target image, i indicating a category number, G_c2(F₂) Representing the second classificationScore map, [ G ]_c2(F₂)]^i，jWhether the jth pixel in the second classification score map belongs to the ith class is represented, and whether the jth pixel is 1 or 0 is not represented;

After the first attention feature map and the second attention feature map are obtained, the output attention feature map can be finely adjusted by adopting a convolution layer of 1 x 1, so that the result is more accurate.

Therefore, the domain adaptive semantic segmentation method based on cross domain class perception, which is designed by the embodiment of the invention, can adjust the class center of the current domain according to the feature content of the perceived domain, so that the trained model can adjust the feature information of the pixel points with fuzzy classes in the classification score map. And finally, the class center of the perceived feature is close to the class center of the perceived field, so that the field adaptation task is completed more excellently.

Fig. 5 is a schematic structural diagram of a domain adaptive semantic segmentation apparatus based on cross-domain class perception according to an embodiment of the present invention. As shown in fig. 5, the apparatus includes a preprocessing module 10, a feature classification module 20, a feature map adjusting module 30, and a semantic segmentation module 40, wherein:

the pre-processing module 10 is configured to: converting the style of a source image in a source data set into the style of a target image in a target data set through a style migration network to obtain a source adaptive image; wherein the source adapted image and the source image have tag data that are consistent; the feature classification module 20 is configured to: processing the source adaptive image by a first feature extraction network and a first classifier in sequence to obtain a first feature map and a first classification score map; processing the target image by a second feature extraction network and a second classifier in sequence to obtain a second feature map and a second classification score map; the feature map adjusting module 30 is configured to: inputting the first feature map, the first classification score map, the second feature map, and the second classification score map to a cross-domain category awareness module; the cross domain category perception module comprises two cross domain category perceptrons, each cross domain category perceptron comprises a cross domain category center generator and a category attention module which are sequentially connected, and the category centers of the first feature map and the second feature map are adjusted through the cross domain category center generators of the two cross domain category perceptrons respectively, so that the category centers of the first feature map and the second feature map are close to each other; respectively carrying out distribution adjustment on the classification fuzzy feature points of the first feature map and the second feature map through the classification attention module to respectively obtain a first attention feature map and a second attention feature map; the semantic segmentation module 40 is configured to: and performing semantic segmentation on the source image according to the first attention feature map and performing semantic segmentation on the target image according to the second attention feature map.

The device provided by the embodiment of the present invention is used for the method, and specific functions may refer to the above method flow, which is not described herein again.

Fig. 6 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 6: a processor (processor)610, a communication Interface (Communications Interface)620, a memory (memory)630 and a communication bus 640, wherein the processor 610, the communication Interface 620 and the memory 630 communicate with each other via the communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform a cross-domain class perception based domain-adaptive semantic segmentation method comprising: converting the style of a source image in a source data set into the style of a target image in a target data set through a style migration network to obtain a source adaptive image; wherein the source adapted image and the source image have tag data that are consistent; processing the source adaptive image by a first feature extraction network and a first classifier in sequence to obtain a first feature map and a first classification score map; processing the target image by a second feature extraction network and a second classifier in sequence to obtain a second feature map and a second classification score map; inputting the first feature map, the first classification score map, the second feature map, and the second classification score map to a cross-domain category awareness module; the cross domain category perception module comprises two cross domain category perceptrons, each cross domain category perceptron comprises a cross domain category center generator and a category attention module which are sequentially connected, and the category centers of the first feature map and the second feature map are adjusted through the cross domain category center generators of the two cross domain category perceptrons respectively, so that the category centers of the first feature map and the second feature map are close to each other; respectively carrying out distribution adjustment on the classification fuzzy feature points of the first feature map and the second feature map through the classification attention module to respectively obtain a first attention feature map and a second attention feature map; and performing semantic segmentation on the source image according to the first attention feature map and performing semantic segmentation on the target image according to the second attention feature map.

In addition, the logic instructions in the memory 630 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, where the computer program includes program instructions, and when the program instructions are executed by a computer, the computer is capable of executing the method for domain-adaptive semantic segmentation based on cross-domain class awareness provided by the above-mentioned method embodiments, where the method includes: converting the style of a source image in a source data set into the style of a target image in a target data set through a style migration network to obtain a source adaptive image; wherein the source adapted image and the source image have tag data that are consistent; processing the source adaptive image by a first feature extraction network and a first classifier in sequence to obtain a first feature map and a first classification score map; processing the target image by a second feature extraction network and a second classifier in sequence to obtain a second feature map and a second classification score map; inputting the first feature map, the first classification score map, the second feature map, and the second classification score map to a cross-domain category awareness module; the cross domain category perception module comprises two cross domain category perceptrons, each cross domain category perceptron comprises a cross domain category center generator and a category attention module which are sequentially connected, and the category centers of the first feature map and the second feature map are adjusted through the cross domain category center generators of the two cross domain category perceptrons respectively, so that the category centers of the first feature map and the second feature map are close to each other; respectively carrying out distribution adjustment on the classification fuzzy feature points of the first feature map and the second feature map through the classification attention module to respectively obtain a first attention feature map and a second attention feature map; and performing semantic segmentation on the source image according to the first attention feature map and performing semantic segmentation on the target image according to the second attention feature map.

In yet another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented by a processor to perform the method for domain-adaptive semantic segmentation based on cross-domain category perception provided in the foregoing embodiments, where the method includes: converting the style of a source image in a source data set into the style of a target image in a target data set through a style migration network to obtain a source adaptive image; wherein the source adapted image and the source image have tag data that are consistent; processing the source adaptive image by a first feature extraction network and a first classifier in sequence to obtain a first feature map and a first classification score map; processing the target image by a second feature extraction network and a second classifier in sequence to obtain a second feature map and a second classification score map; inputting the first feature map, the first classification score map, the second feature map, and the second classification score map to a cross-domain category awareness module; the cross domain category perception module comprises two cross domain category perceptrons, each cross domain category perceptron comprises a cross domain category center generator and a category attention module which are sequentially connected, and the category centers of the first feature map and the second feature map are adjusted through the cross domain category center generators of the two cross domain category perceptrons respectively, so that the category centers of the first feature map and the second feature map are close to each other; respectively carrying out distribution adjustment on the classification fuzzy feature points of the first feature map and the second feature map through the classification attention module to respectively obtain a first attention feature map and a second attention feature map; and performing semantic segmentation on the source image according to the first attention feature map and performing semantic segmentation on the target image according to the second attention feature map.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A domain adaptive semantic segmentation method based on cross domain category perception is characterized by comprising the following steps:

converting the style of a source image in a source data set into the style of a target image in a target data set through a style migration network to obtain a source adaptive image; wherein the source adapted image and the source image have tag data that are consistent;

processing the source adaptive image by a first feature extraction network and a first classifier in sequence to obtain a first feature map and a first classification score map; processing the target image by a second feature extraction network and a second classifier in sequence to obtain a second feature map and a second classification score map;

inputting the first feature map, the first classification score map, the second feature map, and the second classification score map to a cross-domain category awareness module; the cross domain category perception module comprises two cross domain category perceptrons, each cross domain category perceptron comprises a cross domain category center generator and a category attention module which are sequentially connected, and the category centers of the first feature map and the second feature map are adjusted through the cross domain category center generators of the two cross domain category perceptrons respectively, so that the category centers of the first feature map and the second feature map are close to each other; respectively carrying out distribution adjustment on the classification fuzzy feature points of the first feature map and the second feature map through the classification attention module to respectively obtain a first attention feature map and a second attention feature map;

and performing semantic segmentation on the source image according to the first attention feature map and performing semantic segmentation on the target image according to the second attention feature map.

2. The method according to claim 1, wherein the adjusting the class centers of the first feature map and the second feature map by the cross domain class center generators of the two cross domain class perceptrons respectively comprises:

performing inner product operation on the first classification score graph and the second feature graph to obtain a classification center of the adjusted first feature graph;

and performing inner product operation on the second classification score chart and the first feature chart to obtain the adjusted classification center of the second feature chart.

3. The method of claim 2, wherein the adjusted class center of the first feature map is expressed as:

the adjusted class center of the second feature map is represented as:

wherein the content of the first and second substances,

4. The method according to claim 1, wherein the distribution adjustment of the fuzzy classification feature points of the first feature map and the second feature map is performed by the classification attention module, so as to obtain a first attention feature map and a second attention feature map, respectively, specifically comprising:

taking the first classification score map as an attention map, and performing inner product operation on the first classification score map and the class center adjusted by the source data to obtain a first class attention feature; channel addition is carried out on the first class attention feature and the first feature map to obtain the first attention feature map;

taking the second classification score map as an attention map, and performing inner product operation on the second classification score map and the class center adjusted by the target data to obtain a second class attention feature; and performing channel addition on the attention feature of the second category and the second feature map to obtain the second attention feature map.

5. The method of cross-domain class perception based domain-adaptive semantic segmentation according to claim 4, wherein the first attention feature map is represented as:

wherein the content of the first and second substances,

the second attention feature map is represented as:

wherein the content of the first and second substances,

6. The method of cross-domain class perception based domain-adaptive semantic segmentation according to claim 4, further comprising:

fine-tuning the first and second attention feature maps using a 1 × 1 convolutional layer.

7. The method of claim 1, wherein before the source-adapted image is processed sequentially through a first feature extraction network and a first classifier, the method further comprises: performing channel compression on the source adapted image;

before the target image is processed by the second feature extraction network and the second classifier in sequence, the method further comprises: and carrying out channel compression on the target image.

8. A domain adaptive semantic segmentation device based on cross domain category perception is characterized by comprising:

a pre-processing module to: converting the style of a source image in a source data set into the style of a target image in a target data set through a style migration network to obtain a source adaptive image; wherein the source adapted image and the source image have tag data that are consistent;

a feature classification module to: processing the source adaptive image by a first feature extraction network and a first classifier in sequence to obtain a first feature map and a first classification score map; processing the target image by a second feature extraction network and a second classifier in sequence to obtain a second feature map and a second classification score map;

a feature map adjustment module to: inputting the first feature map, the first classification score map, the second feature map, and the second classification score map to a cross-domain category awareness module; the cross domain category perception module comprises two cross domain category perceptrons, each cross domain category perceptron comprises a cross domain category center generator and a category attention module which are sequentially connected, and the category centers of the first feature map and the second feature map are adjusted through the cross domain category center generators of the two cross domain category perceptrons respectively, so that the category centers of the first feature map and the second feature map are close to each other; respectively carrying out distribution adjustment on the classification fuzzy feature points of the first feature map and the second feature map through the classification attention module to respectively obtain a first attention feature map and a second attention feature map;

a semantic segmentation module to: and performing semantic segmentation on the source image according to the first attention feature map and performing semantic segmentation on the target image according to the second attention feature map.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for domain-adaptive semantic segmentation based on cross-domain class perception according to any one of claims 1 to 7 when executing the computer program.

10. A non-transitory computer readable storage medium, having stored thereon a computer program, which, when being executed by a processor, performs the steps of the method for domain-adaptive semantic segmentation based on cross-domain class perception according to any one of claims 1 to 7.