CN115659852A

CN115659852A - Layout generation method and device based on discrete potential representation

Info

Publication number: CN115659852A
Application number: CN202211671875.5A
Authority: CN
Inventors: 陈柳青; 景千芝; 孙凌云; 甄焱鲲; 周婷婷
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2022-12-26
Filing date: 2022-12-26
Publication date: 2023-01-31
Anticipated expiration: 2042-12-26
Also published as: CN115659852B

Abstract

The invention discloses a layout generation method based on discrete potential representation, which comprises the following steps: step 1, constructing a training set, wherein the training set comprises an element frame sequence and a constraint condition for generating layout; step 2, constructing a layout generation network based on element constraint, wherein the layout generation network comprises a feature extraction module, a discrete latent variable generation module and a reconstruction module; step 3, training the layout generation network by adopting a training set to obtain a layout generation model; step 4, constructing a one-way Transformer model by taking constraint conditions as input constraints and corresponding discrete potential representations of the layout as outputs; and 5, taking the obtained unidirectional Transformer model as an input end of the constraint condition, taking an output result of the unidirectional Transformer model as an input of the layout generation model, and obtaining an element frame sequence meeting the input constraint condition. The invention also provides a layout generation device. The method can output the layout of the design drawing which meets the requirement and has high quality according to the planar design requirement.

Description

Layout generation method and device based on discrete potential representation

Technical Field

The invention relates to the field of image generation, in particular to a layout generation method and device based on discrete potential representation.

Background

The flat design is a very important visual communication tool, and the colorful image and the concise and readable Wen Zijie are combined to form a specific visual expression with aesthetic tendency to attract the attention of people and transmit information. The layout design is the basis of the planar design, the core content of the layout design is to reasonably arrange a plurality of design elements required to be displayed in a given canvas range, and a designer usually realizes the layout design by adjusting the size (width, height) and the position (abscissa and ordinate) of the design elements. In addition, in order to enable the flat design to quickly and accurately convey information and attract the attention of the user, the designer usually considers the application scenario of the layout and the type of the design element when arranging the design element. For example, the layout of a fashion magazine is flexible and changeable, image information occupies a large area, and the layout of a science and technology magazine is more neat and precise, and mainly takes text information as a main part.

The academic literature is Layout Generation and Completion with Self-orientation [ J ] 2020. Firstly, discretization is carried out on type information and geometric parameters of design elements in the Layout, then all element information is spliced into a sequence, the relation between the element information is learned by using the Self-attention mechanism of a transform model, the residual element information is predicted step by step according to the relation, and finally, the whole element information sequence is predicted to obtain a brand-new Layout. The scheme can generate a new layout from an empty sequence or a sequence containing partial element information, and can be expanded to various layout generation tasks such as UI layout, document layout, space layout and the like. But the scheme lacks specified constraints in the implementation process and is very dependent on the quality and the volume of the training set. Therefore, the finally generated model is very dependent on heuristic rules, and the diversity of output results cannot be ensured.

Patent document CN110706315a discloses a layout generation method, apparatus, electronic device, and storage medium for a planar design, the method including: acquiring element types in the planar design and the number of elements corresponding to each type of elements, and randomly generating a plurality of initial planar layouts according to the element types and the element numbers; grading each initial plane layout by using a preset grading rule, and classifying each initial plane layout into a high-quality plane layout or a low-quality plane layout according to a grading result; and training a preset generative confrontation network GAN by using a high-quality plane layout in the plurality of initial plane layouts to obtain a trained GAN, and obtaining a new high-quality plane layout through the trained GAN. The method cannot generate the model according to a given scene, lacks practical application capability, has the phenomenon of posterior collapse, and cannot finish the training convergence work of the model.

Patent document CN1584930a discloses an image element layout apparatus, a layout program, and a layout method, which include calculating arrangement intervals between image elements to be laid out based on time differences acquired between the image elements, and arranging the image element arrangement of the selected image elements to be laid out along a path of selected path information. According to the method, the configuration interval between the image elements is analyzed based on the relation between the image elements and time, but the problem of layer stacking or repeated configuration exists only according to the time sequence.

Disclosure of Invention

In order to solve the above problems, the present invention provides a layout generation method based on discrete latent representation, which can output a high-quality design drawing layout meeting requirements and having high quality according to planar design requirements.

A method for generating a layout based on discrete potential representations, comprising:

step 1, constructing a training set, wherein the training set comprises an element frame sequence used for generating layout and corresponding constraint conditions, and the constraint conditions comprise an element category sequence and an application scene;

step 2, constructing a layout generation network based on element constraint, wherein the layout generation network comprises a feature extraction module, a discrete latent variable generation module and a reconstruction module, the feature extraction module comprises a self-attention encoder and is used for hiding an input element category sequence, an element frame sequence and an application scene into a d-dimensional space to generate a corresponding layout latent representation, the discrete latent variable generation module is used for carrying out discretization processing on the generated layout latent representation to obtain a corresponding layout discrete latent representation, and the reconstruction module is used for outputting an element frame sequence corresponding to a real layout according to the input element category sequence, the application scene and the layout discrete latent representation;

step 3, training the layout generation network constructed in the step 2 by adopting a training set to obtain a layout generation model;

step 4, constructing a one-way Transformer model by taking the element category sequence and the application scene as input constraints, and training the one-way Transformer model by utilizing the training set and the layout discrete potential representation in the step 2 to obtain a layout discrete potential representation meeting the input constraint conditions;

and 5, taking the unidirectional Transformer model obtained by training in the step 4 as an input end of a constraint condition, taking the output result layout discrete potential representation of the unidirectional Transformer model as an input of a reconstruction module in the layout generation model, and decoding the layout discrete potential representation to obtain an element frame sequence meeting the input constraint condition.

The invention provides a brand-new LayoutVQ-VAE model, which generates layout by learning discrete potential representation of the layout, and reconstructs a frame sequence of elements by adopting a non-natural regression decoder, thereby obtaining an element frame sequence corresponding to input constraint, and generates high-quality design drawing layout by using the obtained element frame sequence.

Specifically, in step 2, the expression formula of the self-attention encoder is as follows:

where, and means a multi-layer sensor,

is shown as

The parameters of the borders of the individual elements,

is shown as

An element category representing a second application scenario of the layout,

a hidden representation representing each of the entries is shown,

the position of the representation is embedded in,

is shown as

The number of the learning embedded-type can be learned,

，

express correspondence

The hidden output of (2), representing the number of layout headers,

parameters representing a self-attention encoder represent a multi-headed self-attention mechanism in a Transformer model.

Specifically, in step 2, the layout potential representation has the following expression:

in the formula (I), the compound is shown in the specification,

，，

the number of the layout heads is indicated,

a sequence of bounding boxes of the elements is represented,

a sequence of the element categories is represented,

representing an application scenario.

Specifically, in step 2, the discrete latent variable generating module converts the layout latent representation into the closest element in space by using a mapping function according to VQ-VAE theory, where an expression of the mapping function is as follows:

where the layout is represented as a discretized latent representation, representing discretization.

Preferably, in step 2, the reconstruction module reconstructs the frame sequence of the element by using a non-natural regression decoder, so that the model can better understand the relationship between the preceding and following elements, and the layout obtained by final reconstruction is closer to the real layout.

Specifically, the expression of the non-natural regression decoder is as follows:

the formula represents a reconstructed element frame parameter, represents a parameter of the non-natural regression decoder, represents a hidden representation of each input item, represents a corresponding hidden output, represents a first element category, and represents a first application scenario of the layout.

Specifically, in step 3, a cross entropy function and a commitment loss are used to perform parameter adjustment on the layout generation network in the training process, and the specific expression is as follows:

the expression represents reconstruction loss of a model calculated by using a cross entropy method, a weight coefficient representing commitment loss, a stop gradient operator, and a bounding box sequence of reconstruction elements.

Specifically, the stop gradient operator is specifically represented as follows:

the invention also provides a layout generation device, which comprises a computer memory, a computer processor and a computer program which is arranged in the computer memory and can be executed on the computer processor, wherein the computer memory adopts the layout generation model and the one-way Transformer model;

the computer processor, when executing the computer program, performs the steps of: inputting the element category sequence and the application scene requirement of the planar design drawing into a one-way Transformer model, and using the layout discrete potential representation output by the one-way Transformer model as the input of a reconstruction module in a layout generation model to obtain the high-quality design drawing layout meeting the element category and the scene constraint.

Compared with the prior art, the invention has the beneficial effects that:

(1) A novel generative model is proposed that is capable of generating layouts that satisfy user constraints including design element labels (element types and quantities) inside the layout and application scenarios outside the layout.

(2) Accurate and comprehensive data distribution is provided for a generating part through a pre-constructed one-way Transformer model, so that the diversity of generating layout can be ensured on one hand, the quality of the layout directly generated by the model can be ensured on the other hand, the complex post-processing optimization operation is avoided, and the time and space complexity of an algorithm is reduced.

Drawings

FIG. 1 is an overall architecture of a layout generation model proposed by the present invention;

FIG. 2 is a diagram comparing the layout generation results of the layout generation model and the existing model under the constraint of the element category sequence;

FIG. 3 is a diagram comparing a layout reconstruction result of a layout generation model and an existing model under the constraint of an element category sequence;

FIG. 4 is a diagram comparing layout reconstruction results of a layout generation model under the constraints of different application scenarios;

FIG. 5 is a diagram illustrating a layout reconstruction result of a layout generation model under a constraint of an element category sequence and an application scenario.

Detailed Description

A flat design layout consists of a series of design elements. In order to generate a completely new layout, we need to predict the geometric parameters of these elements, including the position coordinates, width and height of the elements, according to given constraints (layout application scenarios and element labels). A layout may thus be defined, wherein the application scenario of the layout is represented,

to express the layout

The number of the elements is one,

indicating the number of elements in the layout. For each element, we use

Is shown in which

Representing the category of the element (e.g., image or title), representing the center coordinates, width, and height of the element's border. In actual training, geometric parameters of all element borders are spliced into a sequence, and a 7-bit uniform quantization method is used for discretizing parameter values. The constraint of the layout is filled by two repeated values as a sum

Sequences of the same length

And a representation.

For the sake of brevity, we use to denote each item in the sequence.

In order to solve the problem of generating a flat design drawing faced by the prior art, the present embodiment provides a layout generation method based on a discrete potential representation.

As shown in fig. 1, step 1, constructing a training set, including an element frame sequence for generating a layout and corresponding constraints, where the constraints include an element category sequence and an application scenario;

step 2, constructing a layout generation network based on element constraint, wherein the layout generation network comprises a feature extraction module, a discrete latent variable generation module and a reconstruction module, the feature extraction module comprises a self-attention encoder and is used for hiding an input element category sequence, an element frame sequence and an application scene into a d-dimensional space to generate a corresponding layout latent representation, the discrete latent variable generation module is used for carrying out discretization processing on the generated layout latent representation to obtain a corresponding layout discrete latent representation, and the final reconstruction module is used for outputting a corresponding element frame sequence in a real layout according to the input element category sequence, the application scene and the layout discrete latent representation;

the self-attention encoder firstly uses a multilayer multi-layer perceptron to perform hidden projection on each input item into a d-dimensional space and performs embedded addition on the input item and the position to obtain a corresponding hidden input sequence, the final output of the encoder is limited to output vectors corresponding to the learnable embedding, the vectors are multi-head potential representations of the layout, the vectors contain characteristic information of the whole layout, and the expression is as follows:

wherein, and denotes a multilayer perceptron, denotes

Element bounding box parameter, representing

An element class representing a second application scenario of the layout, representing a hidden representation of each entry, representing a positional embedding, representing a first learnable embedding, representing a corresponding hidden output, representing a number of layout headers,

The expression of the layout potential representation output by the encoder is as follows:

in the formula (I), the compound is shown in the specification,

，，

the number of the layout heads is indicated,

and the element frame sequence is represented, the element category sequence is represented, and the application scene is represented.

The discrete latent variable generation module adopts a mapping function to convert the layout latent representation into the closest elements in space according to VQ-VAE theory, and the expression of the mapping function is as follows:

in the formula, a layout discrete potential representation is represented, and discretization is represented;

the reconstruction model adopts a non-natural regression decoder to reconstruct the frame sequence of the element, the result obtained by reconstruction can be closer to the real sequence, and the reconstructed element frame sequence is output according to the constraint conditions such as input layout discrete potential representation and response, and the specific expression is as follows:

in the formula (I), the compound is shown in the specification,

representing one reconstructed element bounding box parameter, representing a parameter of the non-natural regression decoder, representing a hidden representation of each input term, representing a corresponding hidden output,

is shown as

The number of the element categories is one,

to express the layout

An application scenario.

Step 3, training the layout generation network constructed in the step 2 by adopting a training set to obtain a layout generation model, and adjusting parameters of the layout generation network by adopting a cross entropy function and commitment loss in the training process, wherein the specific expression is as follows:

in the formula (I), the compound is shown in the specification,

representing the reconstruction loss of the model calculated using the cross-entropy method,

a weight coefficient representing the loss of commitment,

the stop gradient operator is represented by the stop gradient operator,

and representing a frame sequence of the reconstruction element, wherein the expression is as follows:

therefore, the decoder is optimized only by the reconstruction loss, the encoder is optimized by the reconstruction loss and the commitment loss, and the mapping space is optimized by an exponential Moving average algorithm (EMA).

and (3) the prior distribution of the discrete potential representation is defined as a uniform multi-class distribution, so that after the training of the step (4) is completed, a one-way Transformer model is accessed to obtain the discrete potential representation of the regression prediction layout, the prediction of the model on the discrete potential marks is only required to be optimized when the one-way Transformer model is trained, the output corresponding to the condition constraint representation is omitted, the autoregressive sampling is carried out on the discrete potential representation of the layout conforming to the input constraint condition, then the representation and the condition constraint are input into a reconstruction module together, and the element frame sequence meeting the constraint condition is generated, so that the high-quality layout of the design drawing is obtained.

And 5, taking the unidirectional Transformer model obtained by training in the step 4 as an input end of a constraint condition, taking the layout discrete potential representation output by the unidirectional Transformer model as an input of a reconstruction module in the layout generation model, and decoding the layout discrete potential representation to obtain an element frame sequence meeting the input constraint condition.

The process of generating the layout of the design diagram based on the specified element bounding box sequence is known in the prior art, and therefore, the detailed description is omitted.

The present embodiment also provides a layout generating apparatus, including a computer memory, a computer processor, and a computer program in the computer memory and executable on the computer processor, the computer memory using the layout generating model and the one-way Transformer model proposed in the above embodiments, and when the computer processor executes the computer program, the following steps are implemented:

inputting the element category sequence and the application scene requirement of the planar design drawing into a one-way Transformer model, and using the layout discrete potential representation output by the one-way Transformer model as the input of a reconstruction module in a layout generation model to obtain the high-quality design drawing layout meeting the element category and the scene constraint.

In order to illustrate the difference between the model and the existing model, the embodiment also provides the comparison and evaluation of the effect in the practical application process.

In the first case, we use the layout transform model and the layout gan + + model as the reference to evaluate the performance of the layout generation model proposed in this embodiment in the layout generation task under the constraint of the element class sequence, and the specific results are shown in table 1.

As can be seen from table 1, the layout generation model provided in this embodiment achieves the best results in both FID and maxlou indexes, thereby proving that the capability of the discrete layout representation method in summarizing layout features is better than that of the conventional continuous layout representation; and in the aspect of aesthetic quality, the discretization treatment based on the element geometric parameters enables the final model to realize better alignment effect, so that the optimal alignment score is obtained.

As shown in fig. 2, the layout generation results of the layout generation model and the two comparison models are respectively shown, and it can be understood from the figure that the layout generated by the layout gan + + model has the problems of element misalignment and element overlap due to lack of post-processing optimization; and the LayoutTransformer model only learns the predicted relationship between elements because of using a one-way Transformer, the type and number of elements that have not occurred can not be predicted, and thus the distribution of elements in the layout that it produces is not uniform, and phenomena of element overlap and large area whiteout can occur.

Therefore, the output result of the layout generation model provided by the embodiment is closer to the actual layout, and various types of elements can be reasonably arranged and a good alignment effect is realized; in addition, compared with the layout transform model, the one-way transform model used in the layout generation model provided in this embodiment is only used to generate the discrete potential representation of the layout, and a two-way transform is used in decoding the discrete potential representation of the layout, so that the relationship among all elements can be modeled and the geometric parameters of the borders of all elements can be predicted at the same time, which effectively solves the problem of the layout transform model.

Since the layout reconstruction function cannot be realized by the layout transform model, the layout generation model is compared with the layout gan + + model in example 1, and the specific comparison result is shown in table 2.

As can be seen from table 2, the layout generated by the layout generation model more closely approximates the true layout in both feature distribution (using FID evaluation) and element bounding box distance (using maxlou and evaluation).

As shown in fig. 3, it can be seen that our model can not only reconstruct the structure of the real layout, but also accurately restore the detailed position and size of the element, while the layout structure can be roughly captured by the layout gan + + model, it is not accurate enough in predicting the bounding box, and there are serious misalignment and overlap problems.

In the second case, since the layout generation under the constraint of the application scenario is not considered in the prior art, and the influence of the application scenario on the layout is difficult to quantitatively evaluate, in this embodiment, the performance of the layout generation model is qualitatively evaluated by comparing the layouts generated by the layout generation model under the constraint of the same element class sequence but different scenarios (based on PDCard and Magazine datasets).

As shown in fig. 4, a layout reconstruction result diagram generated based on the PDCard data set is shown, where a scene one is a recommended commodity scene, a scene two is a commodity classified display scene, and a scene three-dimensional commodity search scene, and it can be clearly found by comparing layouts of the same line that even though the same element category sequence is used, a layout corresponding to the corresponding scene can be generated:

in the layout applied to the merchandise recommendation scene, the image elements occupy a larger area because the picture representation can transfer merchandise information and attract consumers more quickly;

in the layout applied to the commodity classified display scene, the proportion of the description elements corresponding to the image elements is increased, so that the corresponding description elements can be seen when the image elements are seen;

whereas in layouts applied to merchandise search scenarios, the image elements typically occupy a smaller area, the information is presented primarily in text form, which can help consumers to further explore merchandise details.

As shown in fig. 5, a layout reconstruction result diagram generated based on a Magazine data set is shown, the layout of the scientific Magazine and the news Magazine is more focused on the structural and regularity of texts, and needs to show a serious and rigorous layout, while the fashion Magazine and the food Magazine are biased to entertainment and leisure, and need more image creatives and unconventional layouts, so that the eyeballs of readers can be caught.

Claims

1. A method for generating a layout based on discrete potential representations, comprising:

step 1, constructing a training set, wherein the training set comprises an element frame sequence for generating layout and corresponding constraint conditions, and the constraint conditions comprise an element category sequence and an application scene;

step 2, constructing a layout generation network based on element constraint, wherein the layout generation network comprises a feature extraction module, a discrete latent variable generation module and a reconstruction module, the feature extraction module comprises a self-attention encoder and is used for hiding an input element category sequence, an element frame sequence and an application scene into a d-dimensional space to generate a corresponding layout latent representation, the discrete latent variable generation module is used for carrying out discretization processing on the generated layout latent representation to obtain a corresponding layout discrete latent representation, and the reconstruction module is used for outputting a corresponding element frame sequence in a real layout according to the input element category sequence, the application scene and the layout discrete latent representation;

2. The method of claim 1, wherein in step 2, the self-attention encoder has the following expression:

in the formula (I), the compound is shown in the specification,

and

a multi-layer perceptron is represented,

is shown as

The parameters of the borders of the individual elements,

denotes the first

The number of the element categories is one,

to express the layout

The context of the application is such that,

a hidden representation representing each of the entries is shown,

the position of the representation is embedded in,

is shown as

The embedded-type intelligent learning machine can be embedded,

，

express correspondence

The hidden output of (a) is,

the number of the layout heads is indicated,

the parameters representing the self-attention encoder are,

represents the multi-headed self-attention mechanism in the Transformer model.

3. The method of claim 1, wherein in step 2, the expression of the layout potential representation is as follows:

in the formula (I), the compound is shown in the specification,

，

，

the number of the layout heads is indicated,

a sequence of bounding boxes of the elements is represented,

a sequence of the element categories is represented,

representing an application scenario.

4. The method according to claim 1, wherein in step 2, the discrete latent variable generation module converts the layout latent representation into space by using a mapping function according to VQ-VAE theory

The expression of the mapping function is as follows:

in the formula (I), the compound is shown in the specification,

the presentation layout is a discrete potential presentation,

the representation is discretized and the representation is discretized,

，

。

5. the discrete potential representation-based layout generation method of claim 1, wherein in step 2, the reconstruction module reconstructs the bounding box sequence of elements using a non-natural regression decoder.

6. The method of claim 5, wherein the non-natural regression decoder is expressed as follows:

in the formula (I), the compound is shown in the specification,

representing one of the reconstructed element bounding box parameters,

representing the parameters of a non-natural regression decoder,

a hidden representation representing each of the entries is shown,

indicate a correspondence

The output is hidden and the output is hidden,

is shown as

The number of the element categories is one,

to express the layout

And (4) application scenes.

7. The method for generating a layout based on discrete potential representations according to claim 1, wherein in step 3, a cross-entropy function and a commitment loss are used in a training process to perform parameter adjustment on the layout generation network, and a specific expression thereof is as follows:

in the formula (I), the compound is shown in the specification,

a weight coefficient representing the loss of commitment,

the stop gradient operator is represented by the stop gradient operator,

representing a bounding box sequence of reconstructed elements.

8. A layout generating apparatus comprising a computer memory, a computer processor, and a computer program in the computer memory and executable on the computer processor, wherein the computer memory employs the layout generation model and the one-way fransformer model of claim 1; the computer processor, when executing the computer program, performs the steps of: