CN113902613A

CN113902613A - Image style migration system and method based on three-branch clustering semantic segmentation

Info

Publication number: CN113902613A
Application number: CN202111399319.2A
Authority: CN
Inventors: 程柳; 祁云嵩; 姜元昊; 吴婷凤; 赵呈祥
Original assignee: Jiangsu University of Science and Technology
Current assignee: Jiangsu University of Science and Technology
Priority date: 2021-11-19
Filing date: 2021-11-19
Publication date: 2022-01-07

Abstract

The invention discloses an image style migration system and method based on three-branch clustering semantic segmentation, which comprises the following steps: image preprocessing, image semantic segmentation, extraction of image content and style features, style matching, and similarity measurement of images. The semantic segmentation technology is adopted, and the problem of style overflow possibly generated in the style migration process is effectively solved by the application of the semantic segmentation technology in the image style migration; the used MUNIT model belongs to an unsupervised deep learning model, a paired data set is not needed, images in various styles can be produced, and the diversity requirements of users are met to a great extent; the step of adopting an image similarity measurement algorithm based on SSIM indexes realizes generation inhibition of similar style images, meets the diversity requirement and simultaneously ensures the stability and effectiveness of the whole system.

Description

Image style migration system and method based on three-branch clustering semantic segmentation

Technical Field

The invention belongs to the technical field of image processing and pattern recognition, and particularly relates to an image style migration system and method based on three-branch clustering semantic segmentation.

Background

The method has a special application in the application of the deep neural network, namely, the style migration of images, the image style migration is developed by Gatys, Johnson and the like, and the stylization of the images can obtain a satisfactory result under specific conditions. At present, popular image style migration algorithms are mainly divided into two types, one is slow style migration based on image iteration, and the other is fast style migration based on model iteration. Model iteration based methods include feed forward stylized model based and GAN based methods. The representative work based on the feedforward stylized model is mainly two, namely the work of Johnson et al and Ulianov et al, while the GAN-based methods are more in variety and have advantages and disadvantages in different scenes. The method can be well expressed in scenes with unobtrusive semantic information, but the problem of semantic mismatching is easy to occur in scenes with sensitive semantics, so that the application of the semantic segmentation technology in image style migration is of great significance.

The semantic segmentation combines image classification, target detection and image segmentation, and the segmented image with semantic annotation can be finally obtained by segmenting the image into region blocks with certain semantic meanings and identifying the semantic category of each region block. The application of combining the semantic segmentation technology and the image style migration is still few, and most of the current researches are focused on respectively improving the precision of semantic segmentation and the speed of image style migration.

Chinese patent publication No.: CN 112950454 a, name: the invention discloses an image style migration method based on multi-scale semantic matching. The method is mainly characterized in that multi-scale depth features in a content image and a style image are extracted.

Although the above method can obtain a good style migration effect, the following problems still remain: 1) paired data sets are difficult to collect or even cannot be acquired, and great limitation is brought to image style migration; 2) after training, results of only one style can be obtained, and the diversity requirements of users cannot be met; 3) only the similarity of the overall style of the image is considered, and the specific style of a specific object cannot be reserved; 4) the problem of style overflow exists, so that the harmony and the appreciation of the whole image are damaged; 5) other clustering method technical schemes have the problem that the clustering effect is not good and the style migration effect is influenced in most models using the clustering method in the image style migration process 6) images with higher similarity exist among the images with various styles output by other technical schemes.

Disclosure of Invention

Referring to fig. 1, the present invention is directed to overcome the defects of the prior art, and provides an image style migration system based on three-branch clustering semantic segmentation and a method thereof, which can effectively solve some problems existing in the image style migration process, and make up for the technical shortages thereof; before the image style migration, firstly, semantic information in the image is extracted, and in the image style migration process, the result obtained by semantic segmentation is matched with the semantic information in the target image, so that the purpose of overall style migration is achieved.

In order to solve the technical problems, the invention adopts the following technical scheme.

The invention relates to an image style migration system based on three-branch clustering semantic segmentation, which comprises the following steps:

the image preprocessing module is used for adding Gaussian noise to the sample image and expanding image data so as to solve the problems of uneven texture in the image style migration process and poor style migration effect caused by insufficient sample data;

the semantic segmentation module is used for segmenting each semantic block in the content image and the style image respectively and providing basic semantic information for the subsequent style matching, and the process comprises the following steps: normalizing the pixel values, and solving a clustering center, core domain label distribution and boundary domain label distribution by using a K-means algorithm; the pixel value normalization processing is to convert the image into a standard form to resist subsequent affine transformation; the K-means algorithm takes the obtained clustering center as the initial input of a K nearest neighbor algorithm which is improved subsequently; the improved k nearest neighbor algorithm introduces the concept of three-branch clustering into the k nearest neighbor algorithm, sets different discrimination rules for a core domain and a discrimination domain, and distributes labels for sample points in two steps; the points needing to be distributed in the boundary domain are points which are not distributed to the labels after the labels are distributed to the core domain, namely the points which cannot be distinguished by the core domain are classified into the boundary domain; the clustering of the sample points is completed through the steps, and a semantic segmentation image is further obtained.

The characteristic extraction module is used for simultaneously extracting low-order and high-order characteristics of the content image and the style image and inputting the characteristics into a characteristic synthesis network to obtain an image fusing the content characteristics and the style characteristics;

the style matching module is used for matching the same type of objects in the content image and the original image so as to carry out style migration between the same type of objects; including a content encoder, a pattern encoder, and a joint decoder; the content encoder is composed of a plurality of convolutional layers for downsampling the input and further processing using a residual block, all of which are followed by an instance normalization that acts to remove the original feature mean and variance representing the style information; the style encoder comprises a plurality of convolutional layers, an average pooling layer and a full-link layer; the joint decoder encodes the content through a set of residual blocks and then generates a reconstructed image through an upsampled layer and a convolutional layer.

The image similarity measurement module is used for measuring the similarity between every two images generated by the system and screening out the image with lower similarity as the final output of the system, and comprises: and (3) calculating the similarity between every two style images generated by the SSIM index calculation system, respectively comparing the brightness, the contrast and the structural characteristics between the two images to finally calculate a similarity value, and screening out the image with low similarity as the final output of the system.

The image preprocessing module adopts a Gaussian noise adding method to avoid the problem of uneven texture possibly occurring in the content and style extraction module, and the adopted data amplification method effectively solves the problem of under-fitting in the image style migration process; the semantic features of the content image and the style image obtained by the semantic segmentation module and the semantic features of the content and style image obtained by the content and style feature extraction module are used for providing input images for the style matching module, and the image similarity measurement module is used for optimizing the output of the whole system.

The invention discloses an image style migration method based on three-branch clustering semantic segmentation, which comprises the following steps of:

step 1, image preprocessing: adding Gaussian noise to the original image; expanding the sample set by using a data augmentation method;

step 2, semantic segmentation: performing semantic segmentation on the image by a K-means three-branch clustering method improved by K neighbors to obtain semantic images of different objects in the image;

step 3, feature extraction: extracting the content and style characteristics of the image by using a MUNIT model;

step 4, style matching: in order to fully integrate semantic information, the style matching network is divided into a semantic matching sub-network and a style integration sub-network; the two sub-networks can fully utilize the semantic information image obtained in the step 2;

step 5, measuring image similarity: and (3) calculating similarity values between different images pairwise by adopting an SSIM similarity measurement function, so that optimization is performed in the generated images of different styles, and a plurality of images with low similarity are further screened out and finally displayed to the user for output.

Further, the step 1 image preprocessing process includes:

step 1.1, Gaussian noise is added; preprocessing the content image to construct a content image I_cThe size and the channel number of the Gaussian noise matrix are the same, and the noise matrix is added with the original image to obtain an image containing Gaussian noise, and the image is used as a content input image; for any point (x) of a channel in the content image_i,y_i) The pixel value of which can be expressed asz, the probability density function of gaussian noise is:

wherein z is a pixel point, P (z) is probability density, sigma is standard deviation, and mu is the average value of pixel values of all points;

step 1.2, data augmentation; by adopting any one or more of scaling transformation, clipping, color transformation, rotation and translation, a series of random changes are made on the training images to generate similar but different training samples, so that the scale of the training data set is enlarged, the dependence of the model on certain attributes is reduced, and the generalization capability of the model is improved.

Further, the step 2 semantic segmentation process includes:

step 2.1, pixel value normalization processing: the invariant moment of the image is utilized to search parameters to eliminate the influence of other transformation functions on image transformation, so that the image can resist the attack of subsequent geometric transformation;

for ease of processing, the pixel values of all points are mapped to a range of 0-1, which is the formula:

wherein, data is the original pixel value, min (data) is the minimum value of the original pixel value, and max (data) is the maximum value of the original pixel value;

step 2.2.K-means algorithm to obtain clustering center: selecting K points as the initial center of each cluster according to a certain strategy, and dividing data into the clusters closest to the K points, namely: dividing data into K clusters to finish one-time division; considering that the initial partition is not necessarily the best partition, the center point of each cluster is recalculated in the generated new clusters, and then the new clusters are divided again until the result of each division is kept unchanged; in practical application, the maximum iteration times are usually preset, and when the maximum iteration times are reached, the calculation is terminated;

then, obtaining a relatively reasonable clustering center, and making early preparation for subsequently dividing a core domain, namely a boundary domain;

step 2.3, core domain category label distribution: the idea of three-branch clustering is introduced to assist decision making, and the three-branch clustering divides data sample data into three regions, namely: c represents a certain category, namely Co (C), F gamma (C) and T gamma (C) respectively represent a core domain, a boundary domain and an outer region; the core domain represents a set of sample points that must be subordinate to class C, the boundary domain represents a set of sample points that may be subordinate to class C, and the outer region represents a set of sample points that may be subordinate to class C;

the relationship of the three regions is as follows:

wherein U is the corpus, Co (C) is the core domain, Fgamma (C) is the boundary domain, Tgamma (C) is the outer region,

is an empty set;

namely, the three areas are mutually exclusive and have no intersection;

an improved k nearest neighbor algorithm is used, and the idea of three-branch clustering is introduced, so that labels are distributed to sample points except for a sample clustering center, and the clustering effect is achieved;

the K-nearest neighbor algorithm is characterized in that the distance between one point and all other points is calculated, the K points closest to the point are taken out, the class of the point is judged according to the class with the largest classification proportion in the K points, and the distance between the point and the point is generally the Euclidean distance, and the formula is as follows:

where ρ is the Euclidean distance between two points, (x)₁,y₁) And(x₂,y₂) Any two points are included;

therefore, K points closest to a certain point are obtained and called as change point neighborhood points, then a shared neighborhood is obtained according to the fields of the two points, and preparation work is prepared for label distribution of subsequent core domain points and boundary domain points;

if the outer region is not considered, the manner of the core domain and the edge domain category labels should be different;

label assignment of core domain points: the discrimination formula of the core domain point is as follows

Wherein, | SNN (this, next) | is the number of two-point shared neighborhood points, this is the current point, and next is the point to be judged; when the number of points in the shared neighborhood of the next point and the this point, namely | SNN (this, next) | satisfies the formula, the next point is classified as the class to which the this point belongs;

label assignment of boundary domain points: the method is a process for redistributing points which are not allocated with labels in core domain allocation, and comprises the steps of forming an allocation matrix M for recording the types of all neighborhood points of a certain point, taking the cluster where the neighborhood points are located most, and allocating the labels to the points which are not allocated with the labels.

Further, in the step 3, the process of extracting the content and style features includes;

the MUNIT model is an extension to the UNIT model, which is called conversion between multi-modal data; UNIT considers that different data sets can share the same hidden space, and the MUNIT model further divides the hidden space into a content hidden space and a style hidden space, wherein the style hidden space is a space for measuring the difference between an original image and a target image;

the coding stage is composed of two self-coders as same as the UNIT model, and is different from the prior art that the coding stage is mapped to a hidden space through two parts of networks and is decomposed into the characteristics of two parts of content and style in the hidden space; then the reconstruction is also done from these two parts in the decoding phase; the whole process requires the content and style loss to be minimized, and the loss function is defined as follows:

wherein the content of the first and second substances,

in order to counteract the loss of resistance,

for loss of reconfigurability, λ_x，λ_c，λ_sTo control the weight of the importance of the reconstruction term.

Further, in step 4, the style matching process includes:

performing style migration on semantic information obtained based on semantic segmentation, namely migration among objects of the same class;

in order to incorporate semantic information, firstly, the semantic mask of the original image is correspondingly downsampled, and the formula is as follows:

m₁＝downsampling(m,scale(l)) (7)

wherein m is₁Semantic mask, scale, (l) representing the downsampling ratio of Caller to m, which is determined by the resolution of the input image and the output resolution of network layer l;

then, splicing the style features on feature dimensions to form new style features, introducing a hyper-parameter lambda for balancing the influence of the traditional features and semantic information on the style, and only using the traditional features for style migration when the lambda is 0 and only using the semantic information for style migration when the lambda is + ∞;

s_n＝norm(s_l)||λ·norm(m_l)， (8)

wherein s is_lStyle characteristics m after fusing semantic information for network layer l_lSemantic information from the content image;

when the style matches the sub-network part, the cosine similarity is used for judgment, and the formula is as follows:

where phi is a function for extracting the features of the image block,

is a characteristic of the style of the target image,

the style characteristics of the style image.

Further, in step 5, the process of measuring the similarity of the image includes:

the structural similarity index SSIM is used for measuring the similarity between two images and is often used for evaluating the image restoration condition after image restoration modeling;

the SSIM index extracts three main features of brightness, contrast, and structure from an image to compare the images, and from the specific implementation point of view, the brightness of an image is characterized by a mean value, the contrast is characterized by a variance, and the structure is characterized by a correlation coefficient, and the specific formula is as follows:

where l (x, y) represents luminance, c (x, y) represents contrast, s (x, y) represents structure, μ_xIs the mean value of sample x, μ_yIs the mean, σ, of the sample y_xIs the variance, σ, of x_yVariance of y, σ_xyCovariance of x and y;

the similarity function is:

wherein SSIM is an image similarity measure index, C₁、C₂Is a constant.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the invention applies the semantic segmentation technology improved by the methods of K-means, KNN, three-branch clustering and the like to the system of image style migration, introduces the method based on SSIM index image similarity measurement in the system, and is used for enhancing the contrast intensity among different results and inhibiting similar images from being repeatedly generated.

2. The method can effectively solve the problem of style overflow which often occurs in the image style migration process by utilizing the improved semantic segmentation algorithm, and the MUNIT model is also fused and used for effectively solving the problems that a matched data set is lacked before model training and only a single style image can be generated after the model training.

3. The image similarity measurement method introduced in the later stage can ensure that the whole system cannot output images with various styles, which have similar styles or cannot be directly observed by naked eyes to have differences, and effectively improves the diversity and stability characteristics of style migration while ensuring the style migration effect.

4. The image style migration system and method based on three-branch clustering semantic segmentation can be applied to style migration of traditional clothes, oil paintings and ceramic patterns, and are beneficial to developing traditional culture in China and further promoting vigorous development of the culture industry.

Drawings

Fig. 1 is a system block diagram and a method flowchart of an embodiment of the present invention.

Fig. 2 is a landscape artistic drawing with the characteristic of the skrit sky produced by the invention, and the original drawing is a color image which is processed and changed into a gray image.

Fig. 3 is a result obtained after image preprocessing according to an embodiment of the present invention, in which fig. 3a is an original image; FIG. 3b is an image with Gaussian noise added; FIG. 3c is the image of the original image after being flipped; the original image is a color image, which is now processed to be a grayscale image.

Fig. 4 is a diagram of semantic segmentation input/output according to an embodiment of the present invention, where fig. 4a is an original diagram, and fig. 4b is an image obtained by semantic segmentation.

FIG. 5 is a schematic diagram of the MUNIT model hidden space according to the present invention; the original image is a color image, which is now processed to be a grayscale image.

FIG. 6 is a diagram of a MUNIT model self-encoder structure according to the present invention.

Fig. 7 is two exemplary graphs for calculating image similarity in an embodiment of the present invention, where fig. 7a is a pre-style migration image,

FIG. 7b is an image after style migration; the original image is a color image, which is now processed to be a grayscale image.

Detailed description of the preferred embodiments

The present invention will be described in further detail with reference to the accompanying drawings.

As shown in FIG. 2, the image style migration system based on the three-branch clustering semantic segmentation provided by the invention can finally obtain various and artistic-interest-rich style images by applying the improved K-means clustering method to the MUNIT model.

As shown in fig. 1, an image style migration system based on three-branch clustering semantic segmentation of the present invention includes:

the semantic segmentation module is used for segmenting each semantic block in the content image and the style image respectively and providing basic semantic information for the subsequent style matching;

the content and style feature extraction module is used for simultaneously extracting low-order and high-order features of the content image and the style image and inputting the features into a feature synthesis network to obtain an image fused with the content feature and the style feature;

the style matching module is used for matching the same type of objects in the content image and the original image so as to carry out style migration between the same type of objects;

the image similarity measurement module is used for measuring the similarity between every two images generated by the system and screening out the image with lower similarity as the final output of the system;

The image preprocessing module is used for the following processes:

(1) adding Gaussian noise to sample set images

Gaussian noise is added to all images in the initial sample set according to the following formula:

wherein: z is a pixel point, P (z) is probability density, sigma is standard deviation, and mu is the average value of pixel values of all points;

(2) sample set data augmentation

The insufficient number of samples is usually an important factor influencing the training effect of the model and the implementation effect of the whole system, and the data augmentation method generates similar but different training samples by randomly changing a series of training images, so that the scale of a training data set is enlarged, the dependence of the model on certain attributes is reduced, and the generalization capability of the model is improved.

The semantic segmentation module is used for the following processes, including:

normalizing the pixel values, and solving a clustering center, core domain label distribution and boundary domain label distribution by using a K-means algorithm;

(1) the main purpose of the pixel value normalization process is to transform the image into a standard form to resist subsequent affine transformations.

Pixel value normalization processing: the invariant moment of the image is utilized to search parameters to eliminate the influence of other transformation functions on image transformation, so that the image can resist the attack of subsequent geometric transformation;

(2) the K-means algorithm mainly aims at taking the obtained clustering center as the initial input of a subsequent improved KNN algorithm; selecting k points as the initial center of each cluster according to a certain strategy, and dividing data into the clusters closest to the k points, namely: dividing data into k clusters to finish one division; considering that the initial partition is not necessarily the best partition, the center point of each cluster is recalculated in the generated new clusters, and then the new clusters are divided again until the result of each division is kept unchanged; in practical application, the maximum iteration times are usually preset, and when the maximum iteration times are reached, the calculation is terminated;

(3) the improved KNN algorithm has the main idea that the concept of three-branch clustering is introduced into the KNN algorithm

Three-branch clustering divides data sample data into three regions, namely: c represents a certain category, namely Co (C), F gamma (C) and T gamma (C) respectively represent a core domain, a boundary domain and an outer region; the core domain represents a set of sample points that must be subordinate to class C, the boundary domain represents a set of sample points that may be subordinate to class C, and the outer region represents a set of sample points that may be subordinate to class C;

the relationship of the three regions is as follows:

is an empty set;

the KNN algorithm is characterized in that the distance between one point and all other points is calculated, k points closest to the point are taken out, the class of the point is judged according to the class with the largest classification proportion in the k points, and the distance between the point and the point is generally determined by using the Euclidean distance, and the formula is as follows:

where ρ is the Euclidean distance between two points, (x)₁,y₁) And (x)₂,y₂) Any two points are included;

the specific method of label distribution is to set different discrimination rules for the core domain and the discrimination domain, and distribute labels for the sample points in two steps; the discrimination formula of the core domain is as follows:

the point needing to be distributed in the boundary domain is a point which is not distributed with the label after the label is distributed to the core domain, namely, the point which cannot be distinguished by the core domain is classified into the boundary domain; the clustering of the sample points is completed through the steps, and a semantic segmentation image is further obtained.

The content and style feature extraction module comprises:

a content encoder, a pattern encoder, and a joint decoder; the content encoder is composed of a plurality of convolutional layers for down-sampling the input and further processing using a residual block, all of which are followed by instance normalization, which is mainly used to remove the original feature mean and variance representing the style information; the style encoder is composed of a plurality of convolutional layers, an average pooling layer and a full link layer. The joint decoder encodes the content through a set of residual blocks and then generates a reconstructed image through an upsampled layer and a convolutional layer.

The loss function is defined as follows:

wherein the content of the first and second substances,

in order to counteract the loss of resistance,

The style matching module comprises a semantic matching sub-module and a style fusion sub-module.

(1) The semantic matching sub-module is mainly used for matching the segmented image with each semantic and constructing an image style, firstly, a semantic mask of an original image is correspondingly downsampled in order to integrate semantic information, and the formula is as follows:

m₁＝downsampling(m,scale(l)) (7)

s_n＝norm(s_l)||λ·norm(m_l)， (8)

(2) the semantic matching sub-module mainly aims at matching on the granularity of the image block, and judges by using cosine similarity when the style matches the sub-network part, and the formula is as follows:

where phi is a function for extracting the features of the image block,

is a characteristic of the style of the target image,

the style characteristics of the style image.

The image similarity measurement module is used for the following processes, including:

and (3) calculating the similarity between every two style images generated by the SSIM index calculation system, respectively comparing the brightness, the contrast and the structural characteristics between the two images to finally calculate a similarity value, and screening out the image with low similarity as the final output of the system.

The three main characteristics of the image are brightness characteristic, contrast characteristic and structural characteristic, and the specific formula is as follows:

the similarity function is:

wherein SSIM is an image similarity measure index, C₁、C₂Is a constant.

first, preprocessing of the image.

Fig. 3a and 3b are respectively the gaussian noise adding operation and the image flipping operation, as shown in fig. 3.

1. Adding Gaussian noise

In order to enable uniform texture to appear in the output image background after the image style migration, the content image is required to be preprocessed to construct a texture I corresponding to the content image_cAnd (characteristic dimension) and the same channel number, and adding the noise matrix and the original image to obtain an image containing Gaussian noise.

2. Data augmentation

In order to solve the problem of poor image style migration effect caused by insufficient image data, data needs to be augmented, similar but different training samples are generated by carrying out a series of random changes on training images, so that the scale of a training data set is enlarged, the dependence of a model on certain attributes is greatly reduced, and the generalization capability of the model is improved.

Second step, semantic segmentation

The whole semantic segmentation process is as follows:

1. pixel value normalization processing:

as shown in fig. 4a, the image has 499 x 701 pixels and the pixel values of the pixels of the image are initially in the range of 0-255, as shown in the following table a, which maps the pixel values of all the pixels into the range of 0-1, and the following table b is the processed data:

obtaining a clustering center by a K-means method:

k-means basic procedure:

assume that the input of the normalized algorithm is data ═ { point ═ point₁,point₂,...point_mAnd the class number is k, the maximum iteration time is set to be N, and then the output of the sample is a division of an original sample set, namely, C₁,C₂,...C_k}。

(1) Selecting k objects from the data as initial clustering centers { μ₁,μ₂,...μ_k}

(2) For each iteration, the distance of each cluster object to the cluster center is calculated to partition the partition criteria as follows:

a) partition of initialization clusters C_kNot equal to empty set, t 1,2

b) For each point in the sample set, the sample point is calculated using the following formula_iAnd each cluster center point mu_jThe distance of (c):

d_ij＝||point_i-μ_j||² (12)

x is to be_iMarked as minimum d_ijThe corresponding cluster is changed to C_λi＝C_λi∪{point_i}.

c) Recalculating the cluster centers:

(3) and repeating for multiple times until the clustering center is not changed or the maximum iteration number is reached, otherwise, continuing to repeat.

3. Core domain class label assignment

The manner of core domain and edge domain class labels should not be the same.

Obtaining k neighborhood points of each point according to a KNN algorithm, wherein the algorithm comprises the following steps:

1) calculating the distance between the test data and each training data;

2) sorting according to the increasing relation of the distances;

3) selecting K points with the minimum distance between each point as neighborhood points of the point;

4) calculating and storing a point shared in the neighborhood between every two points, wherein the point is a shared domain;

the KNN algorithm python is realized by:

according to the obtained clustering center and k adjacent points of the clustering center, distributing labels to the adjacent points according to the following rules:

label assignment of core domain points:

wherein | KNNP (this, next) | is the number of two points sharing domain points.

If the above rules are satisfied, the label is assigned the same label as the cluster center. If not, the label is classified as unallocated.

And then distributing labels to the neighborhood points of the neighbor points of the clustering center, and so on until the neighborhood points do not exist any more.

4. Boundary domain label assignment

The assignment of labels is made to points not completed in the previous step until no more points are assigned labels.

The boundary domain point label allocation rule is as follows:

label assignment for boundary domain points

As shown in fig. 4b, the process of clustering the pixel points of one image is completed, and different objects in the image can be identified according to different categories, so that semantic information of one image is obtained.

And thirdly, extracting the content and style characteristics.

The MUNIT model training process comprises the following steps: as shown in fig. 5 and 6.

Before the MUNIT model is formally trained, parameters of the model need to be configured, and the parameter table is as follows:

the training of the MUNIT model is mainly divided into three processes, namely forward propagation of the network, generation of the model and optimization, identification of the model and optimization, and integration of the optimization of the generator and the discriminator in an initialization function.

The network forward transmission is a coding process, firstly, coding is carried out on two input pictures to respectively obtain content codes and style codes of the two pictures, then, the two pictures are interchanged, and then, noise conforming to normal distribution is added to generate new pictures x _ ab and x _ ba. A network model, i.e. a mapping between the two data sets, is generated, and an authentication model identifies whether the generated image is consistent with the distribution of the other data set. Various parameters of the discriminator and the generator, a learning rate attenuation strategy and a corresponding optimizer are also defined in the initialization function; also loaded is the VGG loaded model used to calculate the perceived loss.

The main loss is divided into 4 parts:

1. loss between reconstructed picture and real picture

2. Calculation loss of late code obtained by reconstructing picture coding and late code obtained by real picture coding

3. The picture is translated to a target domain and then returned to calculate the loss with the original picture

4. Computing domain aware loss using VGG

After the single training reaches a specified number of times, the samples selected in advance are deduced through a def sample (self, x _ a, x _ b) function, and the deduced result is stored in outputs.

And fourthly, style matching.

In order to incorporate semantic information, firstly, a semantic mask (mask) of an original image needs to be correspondingly downsampled, as described in the following formula:

m₁＝downsampling(m,scale(l)) (7)

wherein l represents a network layer number, m₁Semantic mask (mask) indicating the network layer l, scale (l) indicating the downsampling ratio of Caller to m, which is determined by the resolution of the input image and the output resolution of the network layer l.

And then the style features are spliced together s and m in feature dimension to form a new style feature s_nIn order to balance the influence of the traditional characteristics and semantic information on the style, introducing a hyper-parameter lambda, when lambda is 0, only using the traditional characteristics for style migration, and when lambda is + ∞, only using the semantic information for style migration; the user can use different values depending on the actual situation.

s_n＝norm(s_l)||λ·norm(m_l)， (8)

The style fusion sub-network part is judged by cosine similarity, and the formula is as follows:

and fifthly, measuring the image similarity.

As shown in fig. 7, taking the images before and after the style migration as an example, the SSIM index is used to calculate the image similarity before and after the migration, and the basic process is as follows:

1) for an input original image and an image y after lattice migration, firstly, calculating a brightness representation, and comparing to obtain a first similarity-related evaluation

2) Eliminating the influence of brightness characteristics, calculating contrast characterization, and comparing to obtain a second evaluation

3) Excluding the influence of brightness characteristic and contrast characteristic, and comparing the structures

4) Calculating the similarity value of the two images from the three calculated characteristic values to be 0.2292

At this point, the training process of the model is completed, and in the testing stage, a single image is input, so that a plurality of images with different styles can be generated.

According to the embodiment, the data sets do not need to be matched (manually labeled in advance), the problem of style overflow possibly generated in the image style migration process is effectively solved, the images in various styles can be generated by receiving a single input image input by a user, and the requirement of the user on diversity is met.

Claims

1. An image style migration system based on three-branch clustering semantic segmentation is characterized by comprising:

2. The image style migration system based on three-branch clustering semantic segmentation according to claim 1, wherein the semantic segmentation module is used for the following processes comprising:

normalizing the pixel values, and solving a clustering center, core domain label distribution and boundary domain label distribution by using a K-means algorithm; the pixel value normalization processing is to convert the image into a standard form to resist subsequent affine transformation; the K-means algorithm takes the obtained clustering center as the initial input of a K nearest neighbor algorithm which is improved subsequently; the improved k nearest neighbor algorithm introduces the concept of three-branch clustering into the k nearest neighbor algorithm, sets different discrimination rules for a core domain and a discrimination domain, and distributes labels for sample points in two steps; the points needing to be distributed in the boundary domain are points which are not distributed to the labels after the labels are distributed to the core domain, namely the points which cannot be distinguished by the core domain are classified into the boundary domain; the clustering of the sample points is completed through the steps, and a semantic segmentation image is further obtained.

3. The image style migration system based on the three-branch clustering semantic segmentation of the claim 1 is characterized in that the content and style feature extraction module comprises a content encoder, a style encoder and a joint decoder; the content encoder is composed of a plurality of convolutional layers for downsampling the input and further processing using a residual block, all of which are followed by an instance normalization that acts to remove the original feature mean and variance representing the style information; the style encoder comprises a plurality of convolutional layers, an average pooling layer and a full-link layer; the joint decoder encodes the content through a set of residual blocks and then generates a reconstructed image through an upsampled layer and a convolutional layer.

4. The image style migration system based on three-branch clustering semantic segmentation according to claim 1, wherein the image similarity measurement module is used for the following processes, including: and (3) calculating the similarity between every two style images generated by the SSIM index calculation system, respectively comparing the brightness, the contrast and the structural characteristics between the two images to finally calculate a similarity value, and screening out the image with low similarity as the final output of the system.

5. An image style migration method based on three-branch clustering semantic segmentation is characterized by comprising the following steps:

6. The image style migration method based on three-branch clustering semantic segmentation according to claim 5, wherein the step 1 image preprocessing process comprises:

step 1.1, Gaussian noise is added; preprocessing the content image to construct a content image I_cThe size and the channel number of the Gaussian noise matrix are the same, and the noise matrix is added with the original image to obtain an image containing Gaussian noise, and the image is used as a content input image; for any point (x) of a channel in the content image_i,y_i) The pixel value can be expressed as z, and the probability density function of gaussian noise is:

7. The image style migration method based on the three-branch clustering semantic segmentation according to claim 5, wherein the step 2 semantic segmentation process comprises:

the relationship of the three regions is as follows:

wherein U is the corpus, Co (C) is the core domain, Fgamma (C) is the boundary domain, and Tgamma (C) is the outer domainThe area of the part is provided with a plurality of grooves,

is an empty set;

namely, the three areas are mutually exclusive and have no intersection;

8. The image style migration method based on the three-branch clustering semantic segmentation according to claim 5, wherein in the step 3, the process of extracting the content and style features comprises;

wherein the content of the first and second substances,

in order to counteract the loss of resistance,

9. The image style migration method based on the three-branch clustering semantic segmentation according to claim 5, wherein in the step 4, the style matching process comprises:

m₁＝downsampling(m,scale(l)) (7)

s_n＝norm(s_l)||λ·norm(m_l)， (8)

where phi is a function for extracting the features of the image block,

is a characteristic of the style of the target image,

the style characteristics of the style image.

10. The image style migration method based on three-branch clustering semantic segmentation according to claim 5, wherein in the step 5, the image similarity measurement process comprises:

the similarity function is:

wherein SSIM is an image similarity measure index, C₁、C₂Is a constant.