CN112529772B

CN112529772B - Unsupervised image conversion method under zero sample setting

Info

Publication number: CN112529772B
Application number: CN202011501620.5A
Authority: CN
Inventors: 陈元祺; 余晓铭; 刘杉; 李革
Original assignee: Instritute Of Intelligent Video Audio Technology Longgang Shenzhen
Current assignee: Instritute Of Intelligent Video Audio Technology Longgang Shenzhen
Priority date: 2020-12-18
Filing date: 2020-12-18
Publication date: 2024-05-28
Anticipated expiration: 2040-12-18
Also published as: CN112529772A

Abstract

An unsupervised image conversion method under zero sample setting includes applying attribute-visual association constraint and expanding attribute space with unseen attribute, wherein applying attribute-visual association constraint and expanding attribute space with unseen attribute are performed synchronously. By applying attribute-visual association constraints and expanding attribute space with unseen attributes, the model is prompted to fully utilize the attribute features of the category, thereby enabling unsupervised image conversion under zero samples.

Description

Unsupervised image conversion method under zero sample setting

Technical Field

The invention relates to the field of image generation and image conversion, in particular to an unsupervised image conversion method under zero sample setting.

Background

In recent years, with the development of a generation countermeasure network, a generation model is receiving more and more attention. On the one hand, the generation model based on the generation of the countermeasure network shows a surprisingly good generation effect, these generated images being both high-resolution and visually to a sufficient extent to be spurious; on the other hand, as the famous physicist, afman, only if i can create it, i really understand it. Although machine learning models in recent years are excellent in tasks such as image classification, the success of these applications does not suggest that we actually understand images and actually realize intelligence. Being able to generate images is significant for further understanding of the images.

Image-to-image translation is a branch in the generative model that is subject to the conditional generative model and is input on the condition of the input image. It is investigated how to transform an image from one domain to a corresponding image in another domain. For example, an image photographed during the daytime is converted into a night scene while the scene is kept unchanged. This is a challenging task, firstly, the output of the model should have both authenticity and characteristics of the target domain to which it is to be attached; second, the model should keep the output of the input individual characteristics and should not make the converted result in a complete other picture. The problem in the second point is also called mode collapse, i.e. the output collapses into a few modes, the network outputs the same single result even if different inputs are provided to the network.

The above problems are well addressed in a supervised situation. When having paired data sets (like daytime and nighttime images of the same scene), it is possible to approximate the truth image corresponding to the constraint image after the transition from the source domain to the target domain. However, in many scenarios in reality, paired samples are often not available at low cost, even if not present. In this case, how to train the image conversion model unsupervised is a difficulty. Furthermore, the pattern collapse problem of image conversion is particularly acute when some classes have an insufficient number of samples, even no samples at all. In summary, at zero sample settings, unsupervised image conversion is a challenging problem.

Disclosure of Invention

The invention provides an unsupervised image conversion method under zero sample setting, which realizes unsupervised image conversion under zero sample.

The technical scheme of the invention is as follows:

The invention relates to an unsupervised image conversion method under zero sample setting, which comprises the steps of applying attribute-visual association constraint and expanding an attribute space by using invisible attribute, wherein the attribute-visual association constraint is applied and the attribute space is expanded by using invisible attribute synchronously.

Preferably, in the above-mentioned unsupervised image conversion method under the zero sample setting, applying the attribute-visual association constraint includes the steps of: two visible category attributes a _m and a _n are sampled from attribute space and the relevance of the two is calculatedAccording to the adaptive instance normalization (AdaIN) method of style migration, the visual features w _m、w_n of the visual space determined by two seen category attributes a _m and a _n are calculated, and the correlation/>, of both are calculatedApplying an association constraint: constraint regularization term L _reg＝||s(a_m,a_n)-s(w_m,w_n)||₂ is applied to the relevance of the two visible category attributes a _m and a _n and the relevance of the visual feature w _m、w_n determined by the two visible category attributes a _m and a _n.

Preferably, in the above-mentioned unsupervised image conversion method under the zero sample setting, the attribute space is extended by using the unseen attribute, and the method includes the following steps: the unseen category attribute a _u and the input image x _i are sampled and the image x _t is generated by a generator: by loss functionConstraining image x _t to have features of unseen category attribute a _u; and performing attribute regression by using the discriminator, and expanding the attribute space.

According to the technical scheme of the invention, the beneficial effects are that:

According to the method, attribute-visual relevance constraint is applied, and the unseen attribute is utilized to expand the attribute space, so that the model can be fully utilized to the attribute characteristics of the category, and unsupervised image conversion under a zero sample is realized.

For a better understanding and explanation of the conception, working principle and inventive effect of the present invention, the present invention will be explained in detail below by means of specific embodiments with reference to the drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application.

Fig. 1 is a general block diagram of an unsupervised image conversion method under a zero sample setting of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the present invention.

The image conversion model related to the unsupervised image conversion method under the zero sample setting of the invention is based on the generation of an countermeasure network and comprises a generator and a discriminator (shown in figure 1). Training the generation of the countermeasure network is a maximum and minimum game process in which the goal of the generator is to generate enough samples to confuse the arbiter with spurious artifacts; the arbiter then attempts to distinguish samples from the real data distribution from the generated samples. Training to the steady phase, the generator will be able to have a higher quality sample, and the arbiter will also have difficulty distinguishing it from a true sample.

In the case of zero samples, there is a portion of the class missing image sample data, referred to as the unseen class. For unseen categories, the only thing to know is the attribute characteristics they each hold. The image conversion model under zero sample aims at inputting the image to be converted and the category attribute and converting the image into the target category. The key to the problem is how to migrate knowledge of the visible category to the invisible category and how to cause the model to transform with the category attributes.

The working principle of the invention is as follows: by applying attribute-visual association constraints and expanding attribute space with unseen attributes, the model is prompted to fully utilize the attribute features of the category, thereby enabling unsupervised image conversion under zero samples.

Attribute-visual association constraints refer to both the association maintained for an attribute pair of an attribute space and the association of a converted image pair according to the attribute pair, the constraints being consistent. Because there are no image samples in the training phase for the unseen categories, attribute vectors need to be utilized to provide efficient guidance for image conversion. By introducing attribute-visual relevance constraints, the structure of maintaining attribute space in the learned visual space is guided, thereby facilitating the image conversion for the unseen category.

Extending the attribute space with the unseen attributes refers to applying the attributes of the unseen categories to the training process. For unseen categories, although their image samples cannot be obtained, the corresponding category attributes can be obtained. In the training phase, the invisible property is substituted into the image conversion model together with the input image, and the invisible property is caused to be captured from the converted image. The strategy can better avoid mapping deviation in zero sample image conversion, namely, a conversion model for an unseen category is biased to be converted to a similar seen category, so that conversion performance gap between the seen category and the unseen category is reduced.

Fig. 1 is a general framework diagram of an unsupervised image conversion method under zero sample setting of the present invention, and as shown, the method of the present invention includes two strategies, namely strategy 1: applying attribute-visual association constraints, policy 2: the attribute space is extended with unseen attributes, where both policies are done synchronously.

Wherein applying the attribute-visual association constraint comprises the steps of:

1) Two visible category attributes a _m and a _n (i.e., attribute 1 and attribute 2 in the attribute space of FIG. 1) are sampled from the attribute space and the correlation of the two is calculated

2) According to the adaptive instance normalization (AdaIN) method of style migration, the visual features w _m、w_n of the visual space (corresponding to image 1 and image 2 in the visual space of FIG. 1) determined by two seen category attributes a _m、a_n are calculated, and the correlation of the two is calculatedAnd

3) Applying an association constraint: constraint regularization term L _reg＝||s(a_m,a_n)-s(w_m,w_n)||₂ is applied to the relevance of the two visible category attributes a _m and a _n and the relevance of the visual feature w _m、w_n determined by the two visible category attributes a _m、a_n.

Extending the attribute space with unseen attributes, comprising the steps of:

1) Sampling the unseen category attribute a _u and the input image x _i, and generating an image x _t by using a generator;

2) By loss function Constraining the generated image x _t to have the characteristics of the unseen category attribute a _u; and

3) And carrying out attribute regression by utilizing a discriminator, and expanding the attribute space.

Compared with the existing image conversion method, the method provided by the invention has better conversion accuracy and better generation quality. The two concepts of conversion accuracy and generation quality in image conversion and the associated evaluation index are explained below, respectively.

Conversion accuracy: it is measured whether an image is subordinate to the domain to be converted after conversion. The probability that the converted image belongs to the target domain is judged by a pre-trained classifier, and the evaluation indexes comprise Top-1 classification accuracy and Top-5 classification accuracy, namely, for a picture, if the previous (or the previous five) of the probability contains a correct answer, the correct accuracy is considered.

The generation quality is as follows: and measuring whether the converted image has higher image quality. The evaluation index is classified into objective evaluation and subjective evaluation. Fries Lei Qie perceived distance (FID) is a commonly used objective assessment of the quality of production. To calculate the FID of the image conversion model, a batch of converted images is first generated using the model and a batch of images is sampled from the dataset as a comparison. Then, the features of the two batches of images are extracted, their statistical properties are calculated, and the difference of the distribution between the generated image and the real image is measured based on the statistical properties, and is used as an evaluation of the quality of the generated image. For subjective evaluation, the conversion results of several models are often presented to the testee at the same time, so that the testee can pick out an image with the highest quality. After a large number of tests are carried out, the model with higher selection rate is obtained, and the model has higher generation quality.

As shown in Table 1, the objective index comparison of the results of the present invention with other algorithms includes conversion accuracy for both the visible and invisible categories and the generation of a quality index FID. Compared with the existing model (FUNIT-1 and FUNIT-5 are not in zero sample setting and are in unfair contrast, and StarGAN is in zero sample setting), the invention can obtain better effect on both CUB and FLO data sets, wherein the promotion is more remarkable for the unseen category.

TABLE 1 comparison of objective indicators of the results of the present invention with other algorithms

As shown in table 2, for subjective evaluation, when the conversion results of several models are presented to the subject at the same time, the present invention is chosen to be much higher than StarGAN, which is also at zero sample setting; the present invention also presents competitive results for FUNIT-1 and FUNIT-5 at a low sample setting.

Table 2 shows the subjective index comparison of the results of the present invention with other algorithms.

Model	CUB data set	FLO data set
			FUNIT-1	27.8％	21.8％
FUNIT-5	34.2％	27.8％
			StarGAN	7.8％	14.3％
The invention is that	30.2％	36.1％

The foregoing is only illustrative of the present invention and is not to be construed as limiting thereof, but rather as various modifications, equivalent arrangements, improvements, etc., within the spirit and principles of the present invention.

Claims

1. An unsupervised image conversion method under zero sample setting, comprising applying attribute-visual association constraint and expanding attribute space by using invisible attribute, wherein said applying attribute-visual association constraint and said expanding attribute space by using invisible attribute are performed synchronously;

The applying attribute-visual association constraint comprises the steps of:

Sampling two visible category attributes from attribute space And/>And calculates the correlation/>; According to the adaptive instance normalization (AdaIN) method of style migration, the class attributes seen by the two are computed/>And/>Visual features of the determined visual space/>And calculates the correlation/>Applying an association constraint: for the two seen category attributes/>And/>And by the two seen category attributes/>And/>Determined visual characteristics/>、/>Constraint regularization term/>。

2. The method for unsupervised image conversion under zero sample setting according to claim 1, wherein the expanding the attribute space with unseen attributes comprises the steps of:

sample unobserved category attribute And input image/>Generating an image/>, using a generator; By loss functionConstraining the image/>To have the unseen category attribute/>Is characterized by (2); and performing attribute regression by using the discriminator, and expanding the attribute space.