CN117952171A

CN117952171A - Model generation method, image generation device and electronic equipment

Info

Publication number: CN117952171A
Application number: CN202410054935.1A
Authority: CN
Inventors: 钱生; 纳拉米利·克里希纳·萨加尔·雷迪; 闵安游
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2024-01-12
Filing date: 2024-01-12
Publication date: 2024-04-30

Abstract

The embodiment of the application discloses a model generation method, an image generation device and electronic equipment. The method comprises the following steps: acquiring a forward diffusion sample sequence corresponding to each of a plurality of diffusion stages, wherein the diffusion stages are sequentially arranged; based on the sub-model to be trained corresponding to each of the diffusion phases, obtaining a reverse diffusion sample sequence corresponding to each of the diffusion phases; training the sub-model to be trained corresponding to each of the diffusion phases based on the forward diffusion sample sequence and the reverse diffusion sample sequence corresponding to each of the diffusion phases, so as to obtain the target sub-model corresponding to each of the diffusion phases; and obtaining a target diffusion model based on the target sub-model corresponding to each of the diffusion stages. Therefore, under the condition that the final target diffusion model is obtained based on the target sub-model corresponding to each of the diffusion stages, the target diffusion model has better generation performance.

Description

Model generation method, image generation device and electronic equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a model generating method, an image generating device, and an electronic device.

Background

In recent years, with the development of deep learning technology, diffusion models based on deep learning have become a research hotspot. The deep learning technology can learn complex characteristics and distribution conditions of data by constructing a multi-layer neural network, so that the performance of a diffusion model is further improved. The diffusion model based on deep learning has remarkable advantages in processing large-scale and high-dimensional data, and is widely applied to the fields of image generation, natural language processing and the like. But the generation performance of the relevant diffusion model is still to be improved.

Disclosure of Invention

In view of the above, the present application proposes a model generation method, an image generation device, and an electronic apparatus to improve the above problems.

In a first aspect, the present application provides a model generating method, the method comprising: acquiring a forward diffusion sample sequence corresponding to each of a plurality of diffusion stages, wherein the diffusion stages are sequentially arranged; obtaining a reverse diffusion sample sequence corresponding to each of the diffusion phases based on the sub-model to be trained corresponding to each of the diffusion phases; training the sub-model to be trained corresponding to each of the diffusion phases based on the forward diffusion sample sequence and the reverse diffusion sample sequence corresponding to each of the diffusion phases, so as to obtain the target sub-model corresponding to each of the diffusion phases; and obtaining a target diffusion model based on the target sub-model corresponding to each of the diffusion stages.

In a second aspect, the present application provides an image generation method, the method comprising: acquiring input data; and inputting the input data into a target diffusion model to obtain a target image output by the target diffusion model, wherein the target diffusion model is obtained based on the model generation method.

In a third aspect, the present application provides a model generating apparatus, the apparatus comprising: a forward sequence obtaining unit, configured to obtain forward diffusion sample sequences corresponding to a plurality of diffusion phases, where the plurality of diffusion phases are sequentially arranged; the reverse sequence acquisition unit is used for acquiring a reverse diffusion sample sequence corresponding to each of the diffusion phases based on the sub-model to be trained corresponding to each of the diffusion phases; the model training unit is used for training the sub-model to be trained corresponding to each of the diffusion phases based on the forward diffusion sample sequence and the reverse diffusion sample sequence corresponding to each of the diffusion phases so as to obtain the target sub-model corresponding to each of the diffusion phases; and the model generating unit is used for obtaining a target diffusion model based on the target sub-models corresponding to the diffusion stages.

In a fourth aspect, the present application provides an image generation apparatus comprising: an input data acquisition unit configured to acquire input data; and the image generation unit is used for inputting the acquired input data into a target diffusion model to obtain a target image output by the target diffusion model, wherein the target diffusion model is obtained based on the model generation method.

In a fifth aspect, the present application provides an electronic device comprising at least a processor, and a memory; one or more programs are stored in the memory and configured to be executed by the processor to implement the methods described above.

In a sixth aspect, the present application provides a computer readable storage medium having program code stored therein, wherein the program code, when executed by a processor, performs the above-described method.

According to the model generation method, the image generation device and the electronic equipment, after the forward diffusion sample sequences corresponding to the diffusion phases are obtained, and the reverse diffusion sample sequences corresponding to the diffusion phases are obtained based on the to-be-trained submodels corresponding to the diffusion phases, the to-be-trained submodels corresponding to the diffusion phases can be trained based on the forward diffusion sample sequences and the reverse diffusion sample sequences corresponding to the diffusion phases, so that the target submodels corresponding to the diffusion phases are obtained, and the target diffusion model is obtained based on the target submodels corresponding to the diffusion phases. Therefore, under the condition that a plurality of diffusion phases are obtained through dividing, one sub-model to be trained can be configured for each diffusion phase independently, then the corresponding sub-model to be trained is trained according to the forward diffusion sample sequence and the reverse diffusion sample sequence corresponding to each diffusion phase, so that the target sub-model corresponding to each diffusion phase is obtained, the noise distribution condition of each diffusion phase can be better learned by the target sub-models, and further, the target diffusion model has better generation performance under the condition that the final target diffusion model is obtained based on the target sub-model corresponding to each diffusion phase.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 shows a schematic diagram of an application scenario of a model generation method according to an embodiment of the present application;

fig. 2 is a schematic diagram of another application scenario of the model generating method according to the embodiment of the present application;

FIG. 3 is a flowchart of a model generation method according to an embodiment of the present application;

FIG. 4 shows a schematic of a plurality of diffusion stages in the practice of the present application;

FIG. 5 shows a schematic diagram of a forward diffusion process in the practice of the present application;

FIG. 6 is a schematic diagram showing the diffusion effect of a forward diffusion process in the practice of the present application;

FIG. 7 is a schematic diagram of a back diffusion process in the practice of the present application;

FIG. 8 is a schematic diagram of a target diffusion model based on multiple target sub-models in the practice of the application;

FIG. 9 is a flow chart of a model generation method according to another embodiment of the present application;

FIG. 10 is a schematic diagram of an initial forward diffusion sample sequence in the practice of the present application;

FIG. 11 is a flow chart illustrating a method of generating a model according to still another embodiment of the present application;

FIG. 12 is a schematic diagram of determining a reverse diffusion initiation sample in an embodiment of the application;

FIG. 13 is a schematic diagram of yet another embodiment of the present application for determining a back-diffusion initiation sample;

FIG. 14 is a flowchart of an image generation method according to still another embodiment of the present application;

fig. 15 is a block diagram showing a structure of a model generating apparatus according to an embodiment of the present application;

fig. 16 is a block diagram showing a configuration of an image generating apparatus according to another embodiment of the present application;

FIG. 17 shows a schematic diagram of a scoring function-based generative model;

FIG. 18 shows a schematic diagram of a comparison of the scores estimated by the model generated by the present application and the scores of the data;

FIG. 19 shows a schematic diagram of evaluating a method provided by an embodiment of the present application based on a time dimension;

FIG. 20 shows a schematic diagram of evaluating a method provided by an embodiment of the present application based on spatial dimensions;

FIG. 21 is a schematic diagram of a method of verifying the present application;

FIG. 22 is a schematic diagram of another embodiment of the present application for verifying a method provided by an embodiment of the present application;

Fig. 23 shows a block diagram of an electronic device of the present application for executing a model generation method or an image generation method according to an embodiment of the present application;

fig. 24 is a storage unit for storing or carrying program code for implementing a model generation method or an image generation method according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The generative model is a model in the field of deep learning for generating new, similar data from existing data. The goal of the generative model is to generate new samples from a given sample (training data). It samples new data from the training data by learning the distribution of the training data. The generation model has application in many fields, such as computer vision, natural language processing, audio generation and the like, and recently, the diffusion model becomes a popular generation model by virtue of the strong generation capability. In addition to applications in the fields of computer vision, speech generation, bioinformatics and natural language processing, more applications will be explored in this field. Diffusion models based on deep learning have significant advantages when processing large-scale, high-dimensional data.

However, the inventors found in the study that the generation performance of the relevant diffusion model is still to be improved.

Accordingly, the inventors have found the above problems in the study and have proposed a model generation method, an image generation apparatus, and an electronic device in which the above problems can be improved in the present application. In the method, after a forward diffusion sample sequence corresponding to each of a plurality of diffusion phases is obtained and a reverse diffusion sample sequence corresponding to each of the plurality of diffusion phases is obtained based on a to-be-trained sub-model corresponding to each of the plurality of diffusion phases, the to-be-trained sub-model corresponding to each of the plurality of diffusion phases can be trained based on the forward diffusion sample sequence and the reverse diffusion sample sequence corresponding to each of the plurality of diffusion phases, so as to obtain a target sub-model corresponding to each of the plurality of diffusion phases, and further, a target diffusion model is obtained based on the target sub-model corresponding to each of the plurality of diffusion phases.

Therefore, under the condition that a plurality of diffusion phases are obtained through dividing, one sub-model to be trained can be configured for each diffusion phase independently, then the corresponding sub-model to be trained is trained according to the forward diffusion sample sequence and the reverse diffusion sample sequence corresponding to each diffusion phase, so that the target sub-model corresponding to each diffusion phase is obtained, the noise distribution condition of each diffusion phase can be better learned by the target sub-models, and further, the target diffusion model has better generation performance under the condition that the target diffusion model is obtained based on the target sub-model corresponding to each diffusion phase. Wherein a better generation performance of the target diffusion model is understood as a better quality of the generated content (e.g. image). Where better quality is understood as the generated content may have more details. Or the efficiency may be higher in case the same quality of content is generated. For example, if the generated content is an image, the target diffusion model obtained by the method provided by the embodiment of the application may be capable of generating more image details.

Before further elaborating on the embodiments of the present application, an application environment related to the embodiments of the present application will be described.

The application scenario according to the embodiment of the present application is described first.

In the embodiment of the application, the provided model generation method or image generation method can be executed by the electronic equipment. In this manner performed by the electronic device, all steps in the model generation method or the image generation method provided by the embodiment of the present application may be performed by the electronic device. For example, as shown in fig. 1, in the case where all steps in the model generation method or the image generation method provided in the embodiment of the present application may be performed by an electronic device, all steps may be performed by a processor of the electronic device 100.

Furthermore, the model generation method or the image generation method provided by the embodiment of the application can also be executed by the server. Correspondingly, in this manner executed by the server, the server may start executing the steps in the model generation method or the image generation method provided by the embodiment of the present application in response to the trigger instruction. The triggering instruction may be sent by an electronic device used by a user, or may be triggered locally by a server in response to some automation event.

In addition, the model generation method or the image generation method provided by the embodiment of the application can be cooperatively executed by the electronic equipment and the server. In such a manner that the electronic device and the server cooperatively execute, part of the steps in the model generation method or the image generation method provided by the embodiment of the present application are executed by the electronic device, and the other part of the steps are executed by the server. For example, taking the model generation method in the present application as shown in fig. 2, the electronic device 100 may perform the model generation method including: and acquiring forward diffusion sample sequences corresponding to each of the diffusion stages. Then, the electronic device 100 may transmit the forward diffusion sample sequences corresponding to the diffusion phases to the server 200, and then the server 200 performs the training based on the to-be-trained sub-models corresponding to the diffusion phases to obtain the reverse diffusion sample sequences corresponding to the diffusion phases, and trains the to-be-trained sub-models corresponding to the diffusion phases based on the forward diffusion sample sequences and the reverse diffusion sample sequences corresponding to the diffusion phases to obtain the target sub-models corresponding to the diffusion phases, and obtains the target diffusion model based on the target sub-models corresponding to the diffusion phases. The server 200 may then store the target diffusion model locally or may return it to the electronic device 100.

In this way, the steps performed by the electronic device and the server are not limited to those described in the above examples, and in practical applications, the steps performed by the electronic device and the server may be dynamically adjusted according to practical situations.

It should be noted that, the electronic device 100 may be a tablet computer, a smart watch, a smart voice assistant, or other devices besides the smart phone shown in fig. 1 and 2. The server 200 may be a stand-alone physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers. In the case where the model generating method provided by the embodiment of the present application is executed by a server cluster or a distributed system formed by a plurality of physical servers, different steps in the model generating method may be executed by different physical servers, or may be executed by a server built based on the distributed system in a distributed manner.

Embodiments of the present application will be described in detail below with reference to the accompanying drawings.

Referring to fig. 3, a method for generating a model according to an embodiment of the present application includes:

S110: and acquiring a forward diffusion sample sequence corresponding to each of a plurality of diffusion stages, wherein the diffusion stages are sequentially arranged.

In the embodiments of the present application, the diffusion phase may be understood as a phase in which model training is performed separately. Wherein, as a way, multiple diffusion stages can be obtained by dividing the time length. For example, the global time interval may be as shown in fig. 4, in which case the global time interval may be divided into a plurality of phases, and the plurality of phases may be further referred to as a plurality of diffusion phases. For example, the diffusion stage S1, the diffusion stage S2, and the diffusion stage SN divided in fig. 4.

In the embodiment of the present application, for each diffusion stage, there is a corresponding forward diffusion sample sequence and reverse diffusion sample sequence.

The forward diffusion sample sequence comprises a plurality of samples obtained by forward diffusion based on the initial samples. Wherein in one case, the initial sample may be understood as a sample to which random noise (e.g., gaussian random noise) is not added, and the forward diffusion process may be understood as a process to which random noise is added. During forward diffusion, noise may be gradually added to the initial image, gradually converting it into a noise image without features. For example, as shown in fig. 5, the sample X0 may be understood as an initial sample, the sample X1 may be understood as a sample obtained by adding random noise once on the basis of the sample X0, and the sample X2 may be understood as a sample obtained by adding random noise once on the basis of the sample X1. Illustratively, as shown in fig. 6, the leftmost image may be understood as an initial sample, the intermediate image may be understood as an image after adding random noise once, and the rightmost image may be understood as an image after adding random noise once more on the basis of the intermediate image. From the comparison, it can be found that as the number of times of increasing random noise increases, noise in the image increases more and more.

In embodiments of the present application, there are a number of ways to obtain a sequence of forward diffusion samples for each of a plurality of diffusion phases.

As one approach, for each diffusion stage, there may be one forward diffusion initial sample for each. The forward diffusion initial samples corresponding to each diffusion stage are different, and for the diffusion stage ordered in the first stage, the corresponding forward diffusion initial samples may be initial samples without any random noise added. For other diffusion phases, the corresponding forward diffusion initial sample may be the last forward diffusion sample in the diffusion phases before the adjacent order. It should be noted that, in the forward diffusion process, an operation of adding random results is performed once in each time step, and each time step is ordered according to the time sequence, where the last forward diffusion sample may be understood as a forward diffusion sample obtained by adding random noise in the last time step. In addition, in the forward diffusion process, random noise to be added each time is predictable, and in this case, a plurality of forward diffusion samples in a forward diffusion sample sequence are available simultaneously, but the scale of random operations to be added by each of the plurality of forward diffusion samples is different, and thus the last forward diffusion sample can be understood as one forward diffusion sample with the largest scale of random noise to be added.

Alternatively, the initial forward diffusion sample sequence may be obtained by performing forward diffusion directly based on the initial sample, and then dividing the initial forward diffusion sample sequence into a plurality of sub-sequences, so that a time period corresponding to each sub-sequence is used as one diffusion stage, and a forward diffusion sample sequence included in each sub-sequence is used as a forward diffusion sample sequence of a corresponding diffusion stage.

Illustratively, without dividing the diffusion phase, the resulting initial forward diffusion sample sequence may be expressed as:

X_seq＝{x₀,x₁,...,x_t,...,x_T}

Where T ε {0,1, …, T }, T is the maximum time series length. The transition probability is set to x _t～q(x_t|x_t-1 according to the properties of the markov chain), the joint probability is known as:

wherein q (x _t|x_t-1) can be set as follows:

wherein, beta _t epsilon (0, 1) is the variance of the noise added in the forward process, alpha _t＝1-β_t, Epsilon-N (0,I). When t=t,/>X _T closely follows a gaussian distribution as follows:

q(x_T)＝∫q(x_T|x₀)q(x₀)dx₀≈N(x_T；0,I)。

Then, as described in the embodiment of the present application, in the case that the diffusion phases are to be divided, a total division into S phases (diffusion phases) may be set based on the maximum time sequence length T, and the resulting forward diffusion sample sequences of each of the diffusion phases may be expressed as:

Wherein, X ^s is the forward diffusion sample sequence corresponding to the s-th diffusion stage.

In the global time interval t E [0, T ], the p-th sample in X ^s The time step of (a) is t _(s,p),X^s, and the time interval of (a) is/>

S120: and obtaining a reverse diffusion sample sequence corresponding to each of the diffusion phases based on the sub-model to be trained corresponding to each of the diffusion phases.

In the embodiment of the application, model training is independently carried out on each diffusion stage, and then the corresponding sub-model to be trained is independently configured on each diffusion stage.

Wherein for each diffusion stage, a corresponding sequence of reverse diffusion samples is obtained. A back-diffused sample sequence is understood to mean, among other things, a sample sequence obtained by back-diffusion. The back diffusion process is a part of a diffusion model, and corresponds to the forward diffusion process. During the back diffusion process, the model (sub-model to be trained) may be based on an a priori distribution by gradually recovering (removing noise) samples of the original complex distribution (e.g., the initial samples described above). This process is in reverse order to the forward diffusion process, but is also an evolution process from none to none, random to ordered. Illustratively, the inverse diffusion process may be understood as sampling a sample from a normal distribution, then calculating the prior distribution of the sample based on the model, and gradually recovering to obtain the sample of the original complex distribution. This process differs from the forward diffusion process in that each step samples the gaussian distribution, but the final objective is similar, all to simulate a random-to-ordered evolution process from none.

In the process of back diffusion, under the condition of not dividing the diffusion stage, the transition probability can be set as x _t-1～p_θ(x_t-1|x_t according to the property of the Markov chain), and then the corresponding joint probability is as follows:

Wherein p _θ(x_t-1|x_t) can be set as follows:

p_θ(x_t-1|x_t)＝N(x_t-1;μ_θ(x_t,t),Σ_θ(x_t,t))。

Wherein μ _θ(x_t,t)/Σ_θ(x_t, t) are the mean and variance of the Gaussian model, respectively, both parameterized by the deep neural network, and recorded the deep neural network parameter θ (μ _θ,Σ_θ).

In the embodiment of the present application, in the case where a plurality of diffusion phases are to be divided, p _θ(x₀,x₁,...,x_T) may be updated as follows based on S diffusion phases:

Wherein, Is the transition probability of the s-th stage.

As one way, in obtaining the reverse diffusion sequence of each diffusion stage, reverse diffusion may be performed based on a base sample corresponding to each diffusion stage, so as to obtain the reverse diffusion sequence of each diffusion stage. In the embodiment of the present application, the inverse diffusion process may be understood as a sampling process, and then a base sample corresponding to each diffusion stage may be understood as a sample with the largest noise scale in each diffusion stage, so as to sample from the sample with the largest noise scale to obtain a corresponding inverse diffusion sequence.

It should be noted that, in the embodiment of the present application, for the same diffusion stage, the number of reverse diffusion samples in the corresponding reverse diffusion sample sequence is the same as the number of forward diffusion samples in the corresponding forward diffusion sample sequence. Or it is understood that the number of time steps involved in the forward diffusion process is the same as the number of time steps involved in the reverse diffusion process for the same diffusion phase. For example, based on the forward diffusion process shown in fig. 5, the corresponding reverse diffusion process may be as shown in fig. 7, where in the example shown in fig. 7, the sub-model to be trained in the diffusion stage may be sampled (or may be understood as denoising) based on the sample X2 to obtain the sample X3, and then the sub-model to be trained in the stage may be sampled based on the sample X3 to obtain the sample X4, thereby obtaining the reverse diffusion sample sequence.

It can be understood that in each sampling process, the sub-model to be trained actually predicts the noise distribution in the denoising process, and then samples based on the predicted noise distribution to obtain the corresponding inverse diffusion sample.

S130: training the sub-model to be trained corresponding to each of the diffusion phases based on the forward diffusion sample sequence and the reverse diffusion sample sequence corresponding to each of the diffusion phases, so as to obtain the target sub-model corresponding to each of the diffusion phases.

It should be noted that, without performing diffusion phase division, model optimization may be performed by minimizing the distribution difference of the forward process and the reverse process, that is, KL divergence of q (x ₀,x₁,...,x_T) and p _θ(x₀,x₁,...,x_T), and obtaining the optimal model parameter θ ^*, where θ ^* may be expressed as:

Wherein, the KL divergence can be expressed as:

in the embodiment of the present application, under the condition that a plurality of diffusion phases are divided, the network parameters of the sub-model to be trained may be updated as follows:

Wherein θ ^s is the sub-model of the S-th stage, and θ is the set of network parameters of the sub-model to be trained of the S-th diffusion stage.

Correspondingly, distribution based on reverse diffusion processAnd a network parameter θ _seq of the sub-model to be trained, the loss function of the sub-model to be trained can be updated as:

s140: and obtaining a target diffusion model based on the target sub-model corresponding to each of the diffusion stages.

The target diffusion model is obtained based on the target sub-models corresponding to the diffusion phases, which can be understood as that the target sub-models corresponding to the diffusion phases are connected in series to obtain the target diffusion model. The multiple target submodels are connected in series, which can be understood as taking the output of one target submodel as the input of another submodel, so that the multiple target submodels are connected in series. In the target diffusion model, the series order of the plurality of target sub-models is opposite to the order of the diffusion stages corresponding to the plurality of target sub-models. In this case, the earlier the corresponding diffusion phase ordering of the target submodels, the later the corresponding concatenation order. The later the concatenation order is understood to be the later the order in which the data processing (sampling) is performed.

For example, as shown in fig. 8, there is a target sub-model M1 for the diffusion phase S1, a target sub-model M2 for the diffusion phase S2, and a target sub-model MN for the diffusion phase SN. Wherein, the diffusion stage S1 is ordered at the first position in the plurality of diffusion stages, then the series connection sequence of the target submodels M1 corresponding to the diffusion stage S1 is ordered at the second position in the plurality of diffusion stages, then the series connection sequence of the target submodels M21 corresponding to the diffusion stage S2 is ordered at the second last position, and so on, so as to obtain the respective series connection sequence of the plurality of target submodels, thereby obtaining the target diffusion model.

In the application process of the target diffusion model shown in fig. 8, input data may be input into the target sub-model MN first, sampling is performed by the target sub-model MN based on the input data, and then the result of the last sampling is taken as output to be input into the target sub-model connected in series before, and so on, output data output by the target sub-model M1 may be obtained. The output data may then be understood as data that needs to be generated by the target diffusion model. The output data may be an image.

According to the model generation method provided by the embodiment, under the condition that a plurality of diffusion phases are obtained through dividing, one sub-model to be trained can be configured for each diffusion phase independently, then the corresponding sub-model to be trained is trained according to the forward diffusion sample sequence and the reverse diffusion sample sequence corresponding to the diffusion phases, so that the target sub-model corresponding to the diffusion phases can be obtained, the noise distribution condition of each diffusion phase can be better learned by the target sub-models, and further, the target diffusion model has better generation performance under the condition that the target diffusion model is obtained based on the target sub-model corresponding to the diffusion phases.

Referring to fig. 9, a method for generating a model according to an embodiment of the present application includes:

s210: forward diffusion is performed for a plurality of time steps based on the initial samples to obtain a sequence of initial forward-diffused samples.

Illustratively, as shown in fig. 10, sample X0 may be understood as an initial sample. Sample X0, sample X1, sample X2.. Sample Xn-2, sample Xn-1, and sample Xn, shown in fig. 10, make up an initial forward diffusion sample sequence.

S220: based on the plurality of time steps and the initial forward diffusion sample sequence, a forward diffusion sample sequence corresponding to each of the plurality of diffusion phases is obtained.

As one way, obtaining forward diffusion sample sequences corresponding to each of the plurality of diffusion phases based on the plurality of time steps and the initial forward diffusion sample sequence may include: based on a plurality of time steps, splitting the initial forward diffusion sample sequence into a plurality of subsequences which are sequenced in sequence, obtaining a corresponding diffusion stage based on each subsequence to obtain a plurality of diffusion stages, and taking each subsequence as a forward diffusion sample sequence corresponding to the corresponding diffusion stage.

Optionally, in this manner, there is no coverage in the global time interval t e [ t _(s,1),t_(s,j) ] in which the different forward diffusion sample sequences X ^s are located. Specifically: for any two forward diffusion sample sequencesAnd/>There is no overlap of any time regions in the global time interval. Adjacent forward diffusion sample sequence/>And/>The first connection is made over a time interval: x ^s last sample/>Time steps t _(s,i) and X ^s+1 first sample/>And the time step t _(s+1,1) of the above table, the two satisfy the following relationship: t _(s+1,1)＝t_(s,i) +1.

Alternatively, the initial forward diffusion sample sequence may be split into a plurality of subsequences ordered in sequence based on a plurality of time steps, and a corresponding diffusion stage is obtained based on each subsequence, so as to obtain a plurality of diffusion stages; and obtaining a forward diffusion sample sequence of a diffusion stage corresponding to each sub-sequence based on each sub-sequence and the designated number of forward diffusion samples in the adjacent ordered sub-sequences of each sub-sequence. Optionally, the specified number is determined based on the overlap ratio and an average length of time corresponding to the plurality of sub-sequences. In this case, the forward diffusion samples in each sub-sequence may include the original forward diffusion samples in each sub-sequence, and a portion of the forward diffusion samples in the adjacently ordered sub-sequences, the portion of the forward diffusion samples being a specified number of the forward diffusion samples in the adjacently ordered sub-sequences. Wherein the specified number of forward diffusion samples are present in two adjacent subsequences at the same time, so that the forward diffusion samples of the portion can be understood as the forward diffusion samples of the overlapping portion.

For example, if the forward diffusion sample sequences corresponding to the sub-sequences sequentially ordered are expressed as: And/> And/>Wherein/>Can be understood as ordered at/>Forward diffusion sample sequence corresponding to previous subsequence,/>Can be understood as ordered at/>The subsequent subsequence corresponds to the forward diffusion sample sequence.

Alternatively, the average time length corresponding to the multiple sub-sequences may be denoted as T _{seq_ave}, and the overlapping ratio of the time intervals is R _{seq_overlap} e (0, 1). Wherein, the time interval is the time interval corresponding to the sub-sequence. In this case, X ^s andOverlapping portions/>X ^s and/>Is divided into (a) overlapping parts ofThe final X ^s (forward diffusion sample sequence of the diffusion phase corresponding to each sub-sequence) is thus the two overlapping parts one after the other and/>Can be expressed as/>

As a further way, the initial forward diffusion sample sequence may be split into a plurality of sub-sequences ordered in sequence based on a plurality of time steps, a corresponding diffusion stage is obtained based on each sub-sequence to obtain a plurality of diffusion stages, a corresponding random sub-sequence is obtained based on each sub-sequence to obtain a plurality of random sub-sequences, forward diffusion samples in the random sub-sequences are obtained from the corresponding sub-sequences randomly, and forward diffusion samples selected randomly from the plurality of random sub-sequences are obtained based on each sub-sequence.

Alternatively, for the S-th subsequence (subsequence), K values S _subset＝{y₁,y₂,…,y_K may be randomly sampled based on the Gaussian distribution y-N (S, δ) (the mean and variance are S and δ), from which a set of K sample subsequences (multiple random subsequences) is selectedFor each subsequence/>, in X _subset (Random subsequence), tr=t _{seq_ave}*R_{seq_overlap} samples were randomly selected and denoted/>(Randomly selected forward diffusion samples).

The final X ^s (forward diffusion sample sequence of the diffusion phase corresponding to each sub-sequence) isAnd a union of random subsequences, which may be expressed as/>

In the case where the forward diffusion sample sequence corresponding to each of the plurality of diffusion phases can be obtained in a plurality of ways, the manner of determining the forward diffusion sample sequence corresponding to each of the plurality of diffusion phases, which is currently specifically adopted, may be determined according to a predetermined configuration. Or the manner of determining the forward diffusion sample sequence corresponding to each of the plurality of diffusion phases, which is specifically adopted at present, can be determined according to the currently available processing resources.

S230: and obtaining a reverse diffusion sample sequence corresponding to each of the diffusion phases based on the sub-model to be trained corresponding to each of the diffusion phases.

The corresponding features may be different for different diffusion phases. For example, the time steps involved for each diffusion phase may be different. For example, the number of samples (forward diffusion samples and reverse diffusion samples) for each diffusion stage may be different. In this case, in order to better adapt the characteristics of each diffusion stage, as a way, the number of forward diffusion sample sequences and reverse diffusion sample sequences in each diffusion stage, and/or the model capacity of the sub-model to be trained corresponding to each diffusion stage, are determined according to the characteristics of each diffusion stage.

The capacity of a model, also referred to as expression capability, refers to the capability of the model to fit complex functions. It is an index describing the size of the fitting capacity of the whole model. If the capacity of the model is low, it may be difficult to fit the objective function on the training set, resulting in a lack of fit. Conversely, if the capacity of the model is high, it may fit the objective function well on the training set, but may not have generalization capability, resulting in overfitting. Therefore, in the embodiment of the application, the model capacity of the sub-model to be trained corresponding to each diffusion stage is determined based on the characteristics of each diffusion stage, which is favorable for improving the accuracy of model capacity determination, and further ensures the performance of the finally determined target diffusion model.

As a way, in the embodiment of the present application, a first correspondence between the features of the diffusion stage and the model capacity of the sub-model to be trained may be established in advance, and a second correspondence between the features of the diffusion stage and the numbers of sample sequences in the forward diffusion sample sequence and the reverse diffusion sample sequence may also be established. The model capacity of the sub-model to be trained, and the number of sample sequences in the forward diffusion sample sequence and the reverse diffusion sample sequence in each diffusion stage can be determined by the first corresponding relation and the second corresponding relation.

S240: training the sub-model to be trained corresponding to each of the diffusion phases based on the forward diffusion sample sequence and the reverse diffusion sample sequence corresponding to each of the diffusion phases, so as to obtain the target sub-model corresponding to each of the diffusion phases.

S250: and obtaining a target diffusion model based on the target sub-model corresponding to each of the diffusion stages.

According to the model generation method, the plurality of target sub-models can learn the noise distribution condition of each diffusion stage better, and therefore the target diffusion model has better generation performance under the condition that the target diffusion model is obtained based on the target sub-model corresponding to each of the plurality of diffusion stages. In addition, in this embodiment, there may be multiple ways of obtaining the forward diffusion sample sequences corresponding to each of the multiple diffusion stages, so as to further improve flexibility in the model generating process.

Referring to fig. 11, a method for generating a model according to an embodiment of the present application includes:

s310: and acquiring a forward diffusion sample sequence corresponding to each of a plurality of diffusion stages, wherein the diffusion stages are sequentially arranged.

S320: and obtaining a reverse diffusion sample sequence corresponding to each of the diffusion stages based on the sub-model to be trained corresponding to each of the diffusion stages, wherein in the process of determining the reverse diffusion sample sequence, the reverse diffusion sample sequence corresponding to the last diffusion stage is obtained by carrying out reverse diffusion based on the last forward diffusion sample, the reverse diffusion sample sequences corresponding to other diffusion stages are obtained by carrying out reverse diffusion based on the reverse diffusion initial sample, and the reverse diffusion initial sample is obtained by carrying out reverse diffusion based on the diffusion samples of the diffusion stages which are adjacently ordered.

As one way, the last determined reverse diffusion sample is ordered in the diffusion stage following the current diffusion stage, as the reverse diffusion initial sample of the current diffusion stage, the current diffusion stage being the diffusion stage of the current sequence of reverse diffusion samples to be performed. Illustratively, as shown in fig. 12, taking the current diffusion stage as the diffusion stage S1 as an example, the diffusion stages sequenced after the diffusion stage S1 as the diffusion stage S2, and the resulting reverse diffusion sequence for the diffusion stage S2 may include a reverse diffusion sample Y1, a reverse diffusion sample Y2, a reverse diffusion sample Y3, and a reverse diffusion sample Y4. The inverse diffusion sample Y1 can be denoised by the diffusion stage S2 corresponding to the sub-model to be trained to obtain an inverse diffusion sample Y2, the inverse diffusion sample Y2 can be denoised by the diffusion stage S2 corresponding to the sub-model to be trained to obtain an inverse diffusion sample Y3, and the inverse diffusion sample Y3 can be denoised by the diffusion stage S2 corresponding to the sub-model to be trained to obtain an inverse diffusion sample Y4. In this case, the reverse diffusion sample Y4 may be used as a reverse diffusion initial sample in the diffusion stage S1.

Alternatively, the last determined multiple reverse diffusion samples in the diffusion stages after the current diffusion stage are used as a reference sample set, and one reverse diffusion sample is selected from the reference sample set and used as a reverse diffusion initial sample of the current diffusion stage, wherein the current diffusion stage is the diffusion stage of the current reverse diffusion sample sequence to be performed. Illustratively, as shown in fig. 13, taking the current diffusion stage as the diffusion stage S1 as an example, the diffusion stages sequenced after the diffusion stage S1 as the diffusion stage S2, and the resulting reverse diffusion sequence for the diffusion stage S2 may include a reverse diffusion sample Y1, a reverse diffusion sample Y2, a reverse diffusion sample Y3, and a reverse diffusion sample Y4. The reverse diffusion samples Y3 and Y4 may be used as reference sample sets, and in this case, the reverse diffusion initial samples in the diffusion stage S1 may be randomly selected from the reverse diffusion samples Y3 and Y4.

S330: training the sub-model to be trained corresponding to each of the diffusion phases based on the forward diffusion sample sequence and the reverse diffusion sample sequence corresponding to each of the diffusion phases, so as to obtain the target sub-model corresponding to each of the diffusion phases.

S340: and obtaining a target diffusion model based on the target sub-model corresponding to each of the diffusion stages.

According to the model generation method, the plurality of target sub-models can learn the noise distribution condition of each diffusion stage better, and therefore the target diffusion model has better generation performance under the condition that the target diffusion model is obtained based on the target sub-model corresponding to each of the plurality of diffusion stages. In addition, in this embodiment, there may be multiple ways of obtaining the reverse diffusion sample sequences corresponding to each of the multiple diffusion stages, so as to further improve flexibility in the model generating process.

Referring to fig. 14, an image generating method provided in an embodiment of the present application includes:

S410: input data is acquired.

The input data may be various kinds of data, for example, the input data may be text, or the input data may be an image.

S420: and inputting the input data into a target diffusion model to obtain a target image output by the target diffusion model, wherein the target diffusion model is obtained based on the model generation method.

Referring to fig. 15, in an embodiment of the present application, a model generating apparatus 500 is provided, where the apparatus 500 includes:

a forward sequence obtaining unit 510, configured to obtain a forward diffusion sample sequence corresponding to each of a plurality of diffusion phases, where the plurality of diffusion phases are sequentially arranged.

The reverse sequence obtaining unit 520 is configured to obtain a reverse diffusion sample sequence corresponding to each of the plurality of diffusion phases based on the sub-model to be trained corresponding to each of the plurality of diffusion phases.

The model training unit 530 is configured to train the sub-model to be trained corresponding to each of the plurality of diffusion phases based on the forward diffusion sample sequence and the reverse diffusion sample sequence corresponding to each of the plurality of diffusion phases, so as to obtain the target sub-model corresponding to each of the plurality of diffusion phases.

The model generating unit 540 is configured to obtain a target diffusion model based on the target sub-models corresponding to the diffusion phases.

As one way, the forward sequence obtaining unit 510 is specifically configured to perform forward diffusion for a plurality of time steps based on the initial sample, so as to obtain an initial forward diffusion sample sequence; based on the plurality of time steps and the initial forward diffusion sample sequence, a forward diffusion sample sequence corresponding to each of the plurality of diffusion phases is obtained.

Optionally, the forward sequence obtaining unit 510 is specifically configured to split the initial forward diffusion sample sequence into a plurality of subsequences ordered in sequence based on a plurality of time steps, and obtain a corresponding diffusion stage based on each subsequence, so as to obtain a plurality of diffusion stages; and taking the forward diffusion sample sequence corresponding to each subsequence as the forward diffusion sample sequence corresponding to the corresponding diffusion stage.

Optionally, the forward sequence obtaining unit 510 is specifically configured to split the initial forward diffusion sample sequence into a plurality of subsequences ordered in sequence based on a plurality of time steps, and obtain a corresponding diffusion stage based on each subsequence, so as to obtain a plurality of diffusion stages; and obtaining a forward diffusion sample sequence of a diffusion stage corresponding to each sub-sequence based on each sub-sequence and the designated number of forward diffusion samples in the adjacent ordered sub-sequences of each sub-sequence. Wherein the specified number may be determined based on the overlap ratio and the average length of time corresponding to the plurality of sub-sequences.

Optionally, the forward sequence obtaining unit 510 is specifically configured to split the initial forward diffusion sample sequence into a plurality of subsequences ordered in sequence based on a plurality of time steps, and obtain a corresponding diffusion stage based on each subsequence, so as to obtain a plurality of diffusion stages; based on each subsequence, obtaining a corresponding random subsequence to obtain a plurality of random subsequences, wherein forward diffusion samples in the random subsequences are obtained from the corresponding subsequences randomly; and obtaining a forward diffusion sample sequence of a diffusion stage corresponding to each subsequence based on each subsequence and a forward diffusion sample randomly selected from a plurality of random subsequences.

As one way, the reverse sequence obtaining unit 520 is specifically configured to, in determining the reverse diffusion sample sequence, arrange the reverse diffusion sample sequence corresponding to the last diffusion stage to obtain the reverse diffusion sample sequence based on the last forward diffusion sample, and obtain the reverse diffusion sample sequence corresponding to the other diffusion stage to obtain the reverse diffusion sample sequence based on the reverse diffusion initial sample, where the reverse diffusion initial sample is obtained based on the reverse diffusion samples of the diffusion stages that are adjacently ordered.

Optionally, the reverse sequence obtaining unit 520 is specifically configured to sort the finally determined reverse diffusion samples in a diffusion stage after the current diffusion stage, as a reverse diffusion initial sample of the current diffusion stage, where the current diffusion stage is a diffusion stage of the current reverse diffusion sample sequence to be performed.

Optionally, the inverse sequence obtaining unit 520 is specifically configured to use the finally determined multiple inverse diffusion samples as the reference sample set in a diffusion stage sequenced after the current diffusion stage; and selecting one reverse diffusion sample from the reference sample set as a reverse diffusion initial sample of a current diffusion stage, wherein the current diffusion stage is a diffusion stage of a current reverse diffusion sample sequence to be performed.

As a way, the model training unit 530 is further configured to determine, according to the characteristics of each diffusion stage, the number of sample sequences in the forward diffusion sample sequence and the reverse diffusion sample sequence in each diffusion stage, and/or the model capacity of the sub-model to be trained corresponding to each diffusion stage.

Referring to fig. 16, an image generating apparatus according to an embodiment of the present application includes:

an input data acquisition unit 610 for acquiring input data.

The image generating unit 620 is configured to input the acquired input data into a target diffusion model to obtain a target image output by the target diffusion model, where the target diffusion model is obtained based on the foregoing model generating method.

It should be noted that, in the present application, the device embodiment and the foregoing method embodiment correspond to each other, and specific principles in the device embodiment may refer to the content in the foregoing method embodiment, which is not described herein again.

Based on the generated model based on the scoring function, the model generating method provided by the embodiment of the application is further explained below.

Wherein the model is generated based on a scoring function as shown in fig. 17. Wherein,The gradient of the data sample x for the scoring function, i.e., the log likelihood of the data distribution p _θ (x). In fig. 18, taking a sample of a mixture gaussian model containing 2 gaussian distributions as an example (leftmost image in fig. 18), there is a certain difference between the scoring function of the real data (middle image in fig. 18) and the scoring function estimated by the model (rightmost image in fig. 18). In a low-density region of the data, the accuracy of a scoring function estimated by the model is insufficient; in the high-density region of the data, the scoring function estimated by the model is more accurate. For this phenomenon, the reasons for its occurrence are elucidated below from two perspectives: methods of generating samples and modeling of noise distribution.

In view of the above problems, the embodiment of the present application proposes a model generation method based on a plurality of diffusion phases. Based on a generating model based on a scoring function, visual explanation is given to the method provided by the embodiment of the application from a time dimension and a space dimension respectively.

Time dimension: as shown in fig. 19, the method provided by the embodiment of the present application can be modeled with different sub-models for different time series phases (diffusion phases), similar to the partitioning for the data region. On the one hand, the sub-model is able to better model the data distribution of the sub-phase (diffusion phase). On the other hand, the optimal sub-model matched with the sampling step can be selected in different sub-stages.

Spatial dimension: as shown in fig. 20, the method provided by the embodiment of the application models different sub-models for different regions, similar to the partitioning of the data regions. In one aspect, the sub-model is able to better model the data distribution of the local region. On the other hand, the optimal submodel with the capacity size matched with the sampling step can be selected from different subregions.

The following may be a specific example to verify that the embodiment of the present application provides an amplification model generation method.

Based on TinySD/SmallSD involved in the method proposed in the target reference, the Model generation method proposed by the embodiment of the application carries out migration learning on TinyS/SmallSD, and a multi-Stage Model (MSM) is proposed. Setting stage s=2, the MSM contains 2 target sub-models, including msm_1 and msm_2.

Wherein MSM_1 is based on a sample sequence (forward diffusion sample sequence)Training, the time interval is t epsilon [ t _(s＝1,1),t_(s＝1,i) ] = [0,500].

MSM_2 is based on a sample sequence (forward diffusion sample sequence)Training, the time interval is t epsilon [ t _(s＝2,1),t_(s＝2,j) ] = [500,1000].

Wherein experiments were conducted in the following configuration.

First experimental mode: the same model was used for both phases: tinySD + TinySD.

Experimental verification was performed on TinySD and MSM on a COCO 2017 validation set (5000 pictures) test set. When measuring indexes such as FID and ClipScore are adopted to evaluate the performance of the venturi chart task based on the DDIM sampler, experimental test results are shown in the following table and figure 21.

The number of sampling steps is understood to be the number of time steps involved.

As shown in the table above, under ClipScore indexes, the three are not quite different, and the effects are comparable. Under the FID index, when both TinySD and MSM use 20 steps (time steps) of sampling, the MSM effect is significantly better than TinySD, although the sampling is time consuming. Furthermore, when TinySD and MSM use 10 and 20 steps of sampling, respectively, MSM sampling takes only half as much time as TinySD, although both are quite effective.

As shown, the MSM (20 steps) is significantly clearer than TinySD (20 steps) in terms of the texture details generated, with more details generated. MSM (10 steps) produced comparable quality compared to TinySD (20 steps).

Conclusion: from the analysis, when the two stages use the same model, the model generation method provided by the embodiment of the application combines the target sub-models of a plurality of diffusion stages in a serial manner, so that the modeling capability of the whole end-to-end is improved. And the quality of sampling generation is better when the sampling steps are the same.

Second experimental mode: two phases use different models: tinySD + SmallSD. Experimental test results are shown in the following table and fig. 22.

As shown in the table above, under ClipScore indexes, the three are not quite different, and the effects are comparable. Under the FID index, when both SmallSD and MSM use 20 steps (time steps) of sampling, the MSM effect is significantly better than SmallSD, although the sampling is time consuming. Furthermore, when SmallSD and MSM use 10 and 20 steps of sampling, respectively, MSM sampling takes only half as much time as SmallSD, although both are quite effective. Also, in both cases, the capacity size of MSM submodel TinySD _1 is less than SmallSD.

As shown, the MSM (20 steps) is significantly clearer than SamllSD (20 steps) in terms of the texture details generated, with more details generated. MSM (10 steps) produced comparable quality compared to SmallSD (20 steps).

Conclusion: from the analysis, when the same model is used in the two stages, models with different model capacities and samples with different step lengths can be set. By optimizing the model capacity and sampling steps of different diffusion stages, the optimal configuration of the end-to-end generation quality and sampling efficiency is realized.

An electronic device according to the present application will be described with reference to fig. 23.

Referring to fig. 23, based on the above model generating method, image generating method, and apparatus, another electronic device 100 capable of executing the above model generating method or image generating method is provided in an embodiment of the present application. The electronic device 2000 includes one or more (only one shown in the figures) processors 202, a memory 204, a network module 206, a sensor module 208, and an audio acquisition device 210 that are coupled to each other. The memory 204 stores therein a program capable of executing the contents of the foregoing embodiments, and the processor 202 can execute the program stored in the memory 204.

Wherein the processor 202 may include one or more processing cores. The processor 202 utilizes various interfaces and lines to connect various portions of the overall electronic device 2000, perform various functions of the electronic device 2000 and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 204, and invoking data stored in the memory 204. Alternatively, the processor 202 may be implemented in hardware in at least one of digital signal Processing (DIGITAL SIGNAL Processing, DSP), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 202 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 202 and may be implemented solely by a single communication chip.

Memory 204 may include random access Memory (Random Access Memory, RAM) or Read-Only Memory (ROM). Memory 204 may be used to store instructions, programs, code sets, or instruction sets. The memory 204 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (e.g., a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described below, etc.

The network module 206 is configured to implement information interaction between the electronic device 2000 and other devices, for example, transmit a device control command, a manipulation request command, and a status information acquisition command. While the electronic device 2000 may be embodied as a different device, its corresponding network module 206 may be different.

The sensor module 208 may include at least one sensor. Specifically, the sensor module 208 may include, but is not limited to: light sensors, motion sensors, pressure sensors, infrared thermal sensors, distance sensors, acceleration sensors, and other sensors.

Wherein the pressure sensor may detect the pressure generated by pressing against the electronic device 2000. That is, the pressure sensor detects a pressure generated by contact or pressing between the user and the electronic device, for example, a pressure generated by contact or pressing between the user's ear and the mobile terminal. Thus, the pressure sensor may be used to determine whether contact or pressure has occurred between the user and the electronic device 2000, as well as the magnitude of the pressure.

The acceleration sensor may detect the acceleration in each direction (typically three axes), and may detect the gravity and direction when stationary, and may be used for applications for recognizing the gesture of the electronic device 2000 (such as landscape/portrait screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer, and knocking), and so on. In addition, the electronic device 2000 may further be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, etc., which will not be described herein.

The audio acquisition device 210 is used for acquiring audio signals. Optionally, the audio capturing device 210 includes a plurality of audio capturing devices. The audio acquisition device may be a microphone. For example, in one approach, the audio acquisition device 210 may include two microphones, in which one microphone may correspond to one analog-to-digital converter and the other microphone may correspond to two analog-to-digital converters of different analog gains. In another approach, the audio collection device 210 may include three microphones. In this manner, two of the microphones (e.g., the primary microphone and the secondary microphone) may each correspond to one analog-to-digital converter, and the other microphone (e.g., the camera microphone) may correspond to two analog-to-digital converters of different analog gains.

As one way, the network module of the electronic device 2000 is a radio frequency module, and the radio frequency module is configured to receive and transmit electromagnetic waves, so as to implement mutual conversion between the electromagnetic waves and the electrical signals, thereby communicating with a communication network or other devices. The radio frequency module may include various existing circuit elements for performing these functions, such as an antenna, a radio frequency transceiver, a digital signal processor, an encryption/decryption chip, a Subscriber Identity Module (SIM) card, memory, and the like. For example, the radio frequency module can perform information interaction with external equipment through the transmitted or received electromagnetic waves, and further receive audio signals transmitted by the external equipment.

Furthermore, the electronic device 2000 may further include an image capturing device for capturing images. For example, video, still pictures or moving pictures can be taken by the image capturing device.

Referring to fig. 24, a block diagram of a computer readable storage medium according to an embodiment of the present application is shown. The computer readable medium 800 has stored therein program code which can be invoked by a processor to perform the methods described in the method embodiments described above.

The computer readable storage medium 800 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Optionally, the computer readable storage medium 800 comprises a non-volatile computer readable medium (non-transitory computer-readable storage medium). The computer readable storage medium 800 has storage space for program code 810 that performs any of the method steps described above. The program code can be read from or written to one or more computer program products. Program code 810 may be compressed, for example, in a suitable form.

In summary, according to the model generating method, the image generating device and the electronic equipment provided by the application, after the forward diffusion sample sequences corresponding to the diffusion phases are obtained, and the reverse diffusion sample sequences corresponding to the diffusion phases are obtained based on the to-be-trained sub-models corresponding to the diffusion phases, the to-be-trained sub-models corresponding to the diffusion phases can be trained based on the forward diffusion sample sequences and the reverse diffusion sample sequences corresponding to the diffusion phases, so as to obtain the target sub-models corresponding to the diffusion phases, and further, the target diffusion model is obtained based on the target sub-models corresponding to the diffusion phases. Therefore, under the condition that a plurality of diffusion phases are obtained through dividing, one sub-model to be trained can be configured for each diffusion phase independently, then the corresponding sub-model to be trained is trained according to the forward diffusion sample sequence and the reverse diffusion sample sequence corresponding to each diffusion phase, so that the target sub-model corresponding to each diffusion phase is obtained, the noise distribution condition of each diffusion phase can be better learned by the target sub-models, and further, the target diffusion model has better generation performance under the condition that the target diffusion model is obtained based on the target sub-model corresponding to each diffusion phase.

The embodiment of the application provides a model generation method, which divides a diffusion model time sequence (global time interval into a plurality of stages (diffusion stages), and models each stage so that the model of each stage can learn noise distribution better.

Furthermore, in the embodiment of the application, the noise distribution characteristics of each stage are considered, and the optimal configuration of the end-to-end generation quality and the sampling time is realized by optimizing the model capacity and the sampling step number of the submodels of different stages.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be appreciated by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not drive the essence of the corresponding technical solutions to depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A method of generating a model, the method comprising:

Acquiring a forward diffusion sample sequence corresponding to each of a plurality of diffusion stages, wherein the diffusion stages are sequentially arranged;

Obtaining a reverse diffusion sample sequence corresponding to each of the diffusion phases based on the sub-model to be trained corresponding to each of the diffusion phases;

Training the sub-model to be trained corresponding to each of the diffusion phases based on the forward diffusion sample sequence and the reverse diffusion sample sequence corresponding to each of the diffusion phases, so as to obtain the target sub-model corresponding to each of the diffusion phases;

and obtaining a target diffusion model based on the target sub-model corresponding to each of the diffusion stages.

2. The method of claim 1, wherein the obtaining a sequence of forward diffusion samples for each of a plurality of diffusion phases comprises:

forward diffusion is carried out for a plurality of time steps based on the initial sample, so as to obtain an initial forward diffusion sample sequence;

And obtaining forward diffusion sample sequences corresponding to a plurality of diffusion phases respectively based on the time steps and the initial forward diffusion sample sequences.

3. The method according to claim 2, wherein obtaining forward diffusion sample sequences corresponding to each of a plurality of diffusion phases based on the plurality of time steps and the initial forward diffusion sample sequence comprises:

Splitting the initial forward diffusion sample sequence into a plurality of subsequences sequenced in sequence based on the plurality of time steps, and obtaining a corresponding diffusion stage based on each subsequence to obtain a plurality of diffusion stages;

each subsequence is taken as a corresponding forward diffusion sample sequence of a corresponding diffusion stage.

4. The method according to claim 2, wherein obtaining forward diffusion sample sequences corresponding to each of a plurality of diffusion phases based on the plurality of time steps and the initial forward diffusion sample sequence comprises:

And obtaining a forward diffusion sample sequence of a diffusion stage corresponding to each sub-sequence based on each sub-sequence and the designated number of forward diffusion samples in the adjacent ordered sub-sequences of each sub-sequence.

5. The method according to claim 4, wherein the method further comprises:

and determining the designated number based on the overlapping rate and the average time length corresponding to the plurality of sub-sequences.

6. The method according to claim 2, wherein obtaining forward diffusion sample sequences corresponding to each of a plurality of diffusion phases based on the plurality of time steps and the initial forward diffusion sample sequence comprises:

Acquiring a corresponding random subsequence based on each subsequence to obtain a plurality of random subsequences, wherein forward diffusion samples in the random subsequences are randomly acquired from the corresponding subsequences;

And obtaining a forward diffusion sample sequence of a diffusion stage corresponding to each subsequence based on each subsequence and a forward diffusion sample randomly selected from the plurality of random subsequences.

7. The method according to claim 1, wherein the method further comprises:

In the process of determining the reverse diffusion sample sequence, the reverse diffusion sample sequence corresponding to the last diffusion stage is obtained by reverse diffusion based on the last forward diffusion sample, the reverse diffusion sample sequences corresponding to other diffusion stages are obtained by reverse diffusion based on the reverse diffusion initial samples, and the reverse diffusion initial samples are obtained by reverse diffusion based on the diffusion stages which are adjacently ordered.

8. The method of claim 7, wherein the method further comprises:

And sequencing the finally determined reverse diffusion samples in diffusion stages after the current diffusion stage as a reverse diffusion initial sample of the current diffusion stage, wherein the current diffusion stage is a diffusion stage of a current reverse diffusion sample sequence to be performed.

9. The method of claim 7, wherein the method further comprises:

Taking a plurality of finally determined reverse diffusion samples as a reference sample set in a diffusion stage after the current diffusion stage;

And selecting one reverse diffusion sample from the reference sample set as a reverse diffusion initial sample of the current diffusion stage, wherein the current diffusion stage is a diffusion stage of a current reverse diffusion sample sequence to be performed.

10. The method according to any one of claims 1-9, characterized in that the method comprises:

and determining the number of the forward diffusion sample sequences and the reverse diffusion sample sequences in each diffusion stage and/or the model capacity of the sub-model to be trained corresponding to each diffusion stage according to the characteristics of each diffusion stage.

11. An image generation method, the method comprising:

Acquiring input data;

inputting the input data into a target diffusion model to obtain a target image output by the target diffusion model, wherein the target diffusion model is obtained based on the method of any one of claims 1-10.

12. A model generation apparatus, characterized in that the apparatus comprises:

a forward sequence obtaining unit, configured to obtain forward diffusion sample sequences corresponding to a plurality of diffusion phases, where the plurality of diffusion phases are sequentially arranged;

the reverse sequence acquisition unit is used for acquiring a reverse diffusion sample sequence corresponding to each of the diffusion phases based on the sub-model to be trained corresponding to each of the diffusion phases;

The model training unit is used for training the sub-model to be trained corresponding to each of the diffusion phases based on the forward diffusion sample sequence and the reverse diffusion sample sequence corresponding to each of the diffusion phases so as to obtain the target sub-model corresponding to each of the diffusion phases;

And the model generating unit is used for obtaining a target diffusion model based on the target sub-models corresponding to the diffusion stages.

13. An image generation apparatus, the apparatus comprising:

An input data acquisition unit configured to acquire input data;

the image generating unit is used for inputting the acquired input data into a target diffusion model to obtain a target image output by the target diffusion model, wherein the target diffusion model is obtained based on the method of any one of claims 1-10.

14. An electronic device comprising a processor and a memory; one or more programs are stored in the memory and configured to be executed by the processor to implement the method of any of claims 1-10, or the method of claim 11.

15. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a program code, wherein the program code, when being executed by a processor, performs the method of any of claims 1-10 or the method of claim 11.