CN116958712B - Image generation method, system, medium and device based on prior probability distribution - Google Patents

Image generation method, system, medium and device based on prior probability distribution Download PDF

Info

Publication number
CN116958712B
CN116958712B CN202311210822.8A CN202311210822A CN116958712B CN 116958712 B CN116958712 B CN 116958712B CN 202311210822 A CN202311210822 A CN 202311210822A CN 116958712 B CN116958712 B CN 116958712B
Authority
CN
China
Prior art keywords
data
encoder
distribution
input
image generation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311210822.8A
Other languages
Chinese (zh)
Other versions
CN116958712A (en
Inventor
袭肖明
何志强
郭子康
乔立山
张淑涵
宁一鹏
张玉龙
纪孔林
聂秀山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Jianzhu University
Original Assignee
Shandong Jianzhu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Jianzhu University filed Critical Shandong Jianzhu University
Priority to CN202311210822.8A priority Critical patent/CN116958712B/en
Publication of CN116958712A publication Critical patent/CN116958712A/en
Application granted granted Critical
Publication of CN116958712B publication Critical patent/CN116958712B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an image generation method, system, medium and equipment based on prior probability distribution, which belong to the technical field of image generation.

Description

Image generation method, system, medium and device based on prior probability distribution
Technical Field
The invention belongs to the technical field of image generation, and particularly relates to an image generation method, system, medium and device based on prior probability distribution.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
Machine learning and artificial intelligence have reached levels comparable to, and even exceeding, humans in many aspects of target recognition, text translation, strategy gaming, etc. Further thinking that the computer can simulate the creation process, so as to assist human beings in creating and generating contents, the image generation technology has advanced over the past ten years, and the existing method can generate images which are difficult for human eyes to recognize true and false under certain conditions. In these technologies, image generation is performed by simple and easily available contents such as pictures, texts, sounds and the like, which greatly improves the creativity of human beings and expands the boundaries of the content generation field.
Deep learning-based image generation technology has become an important research content in the field of artificial intelligence today. It aims to generate virtual images from data using machine learning techniques, which contain useful information that can be used to improve techniques such as visual inspection, image classification and image generation, but also to implement visual analysis, visual design, virtual training, etc.
The deep learning can fully utilize a large amount of data, and learn the data characteristics of meaningful modes, structures and the like in images, characters and sounds by training the data, so that a human visual system can be accurately simulated. It can capture and abstract visual features from social, cultural, natural and other environments. In addition, it helps to build a more accurate and reliable model, which maintains accuracy even in the face of noise and errors.
Currently, the deep learning technique has been widely applied to the field of image generation. For example, VAEs (Variational Autoencoders: variational self-encoders) are image generation models based on deep learning, learn from a given image dataset, identify visual features, and use the learned features for the generation of new images. The idea of VAEs is that the encoder is to learn the distribution of the input data, sample some features from within the distribution as potential representations of the data, and as input to the decoder, which learns to reconstruct the potential representations back into the original data. In the training process of the conventional VAE model, random noise is sampled based on artificial distribution assumption, and a certain difference exists between the random noise and the real distribution of data; meanwhile, noise in the original data can be encoded together in the encoding process, and the accuracy of image generation can be affected under the conditions.
Disclosure of Invention
In order to avoid the influence of artificial distribution assumption on a model, a learner for sampling data distribution is constructed, namely, a mixed density network combining a classification neural network and a Gaussian mixture model, input data is classified through the learner, output information is sent into the Gaussian mixture model, parameters of the Gaussian mixture distribution are determined through calculating hidden variables of a sub-model of the Gaussian mixture model, and the parameters are used as parameters of random vector sampling distribution in a decoder parameterization step of a VAE model (namely, a variation self-encoder) to guide training of the VAE model, so that accuracy of model training is effectively guaranteed, and further accuracy of image generation is effectively guaranteed.
According to a first aspect of an embodiment of the present invention, there is provided an image generation method based on an a priori probability distribution, including:
classifying the marked image data as input to a classified neural network model;
fitting the classification result of the neural network model through the Gaussian mixture model to obtain corresponding Gaussian mixture distribution;
training a variation self-encoder by taking the unmarked image data as an input of the variation self-encoder, and acquiring distribution parameters of the input data based on the encoder in the variation self-encoder; combining random sampling data in Gaussian mixture distribution and distribution parameters output by an encoder, taking the combined data as the input of a decoder in the encoder, and obtaining reconstruction data of the input data through the decoder; wherein the training of the variation self-encoder aims at minimizing errors between the reconstructed data and the input data;
and taking a decoder in the trained variation self-encoder as an image generator to realize image generation.
Further, the output of the encoder includes a mean and variance distribution parameter of the input data, and the combination of the random sampling data in the gaussian mixture distribution and the distribution parameter of the output of the encoder is specifically expressed as follows:
wherein,for mean vector, ++>For variance vector>Is a data vector randomly sampled from a gaussian mixture distribution.
Furthermore, the encoder adopts the architecture of a full-connection layer, a convolution layer and a full-connection layer which are sequentially connected, the last layer of the encoder is connected with a clustering device in parallel, and the characteristic information of the last layer of the encoder is clustered through the clustering device.
Further, the clustering device adopts a DBSCAN clustering device.
Further, in the training of classifying the marked image data as the input of the classifying neural network model and classifying the unmarked image data as the input of the classifying self-encoder, the marked image data and the unmarked image data are subjected to data enhancement in advance, wherein the data enhancement operation comprises random clipping, horizontal flipping, vertical flipping, random rotation, brightness change and noise addition.
Further, the classification neural network adopts a ReLU activation function and a cross entropy loss function, wherein the cross entropy loss function is specifically expressed as follows:
wherein m is the number of categories;as a sign function, if the true class of the sample i is equal to c, 1 is obtained, otherwise, 0 is obtained; />The prediction probability of the observation sample i belonging to the category c is shown, and N is the scale of the classified neural network.
Further, the classification neural network model adopts a convolutional neural network.
According to a second aspect of an embodiment of the present invention, there is provided an image generation system based on an a priori probability distribution, including:
a data distribution sampling unit for classifying the tagged image data as input to a classified neural network model; fitting the classification result of the neural network model through the Gaussian mixture model to obtain corresponding Gaussian mixture distribution;
a variation self-encoder training unit for training the variation self-encoder with the label-free image data as an input of the variation self-encoder, obtaining a distribution parameter of the input data based on an encoder in the variation self-encoder; combining random sampling data in Gaussian mixture distribution and distribution parameters output by an encoder, taking the combined data as the input of a decoder in the encoder, and obtaining reconstruction data of the input data through the decoder; wherein the training of the variation self-encoder aims at minimizing errors between the reconstructed data and the input data;
and the image generation unit is used for realizing image generation by taking a decoder in the trained variation self-encoder as an image generator.
According to a third aspect of embodiments of the present invention, there is provided a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements an image generation method based on a priori probability distribution as described above.
According to a fourth aspect of embodiments of the present invention, there is provided an electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, the processor implementing an image generation method based on a priori probability distribution as described above when executing the program.
Compared with the prior art, the invention has the beneficial effects that:
(1) In order to avoid the influence of artificial distribution assumption on a model, a learner for sampling data distribution is constructed, namely, a mixed density network combining a classification neural network and a Gaussian mixture model, input data is classified through the learner, output information is sent into the Gaussian mixture model, parameters of Gaussian mixture distribution are determined through calculating hidden variables of a sub-model of the Gaussian mixture model, and the parameters are used as parameters of random vector sampling distribution in a decoder re-parameterization step of a VAE model (namely, a variation self-encoder) to guide training of the VAE model, so that the accuracy of model training is effectively ensured, and the accuracy of image generation is further effectively ensured.
(2) According to the scheme, the step of re-parameterization in the VAE model training process uses the random vector sampled in Gaussian mixture distribution, meanwhile, category supervision information is introduced in the encoder learning process, and the robustness of feature learning is improved by combining a clustering method.
(3) In the scheme of the invention, the original input data is input into the model after the data enhancement processing is carried out on the original input data, and the decoder is enabled to reconstruct the original data, so that the model can learn the capability of eliminating noise and extracting important features, and the robustness of the model is improved.
Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.
FIG. 1 is a basic flowchart of an image generation method based on prior probability distribution according to an embodiment of the present invention;
fig. 2 is a schematic diagram of an overall network architecture adopted by the image generation method based on prior probability distribution according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
Embodiments of the invention and features of the embodiments may be combined with each other without conflict.
Embodiment one:
an object of the present embodiment is to provide an image generating method based on an a priori probability distribution, including:
classifying the marked image data as input to a classified neural network model;
fitting the classification result of the neural network model through the Gaussian mixture model to obtain corresponding Gaussian mixture distribution;
training a variation self-encoder by taking the unmarked image data as an input of the variation self-encoder, and acquiring distribution parameters of the input data based on the encoder in the variation self-encoder; combining random sampling data in Gaussian mixture distribution and distribution parameters output by an encoder, taking the combined data as the input of a decoder in the encoder, and obtaining reconstruction data of the input data through the decoder; wherein the training of the variation self-encoder aims at minimizing errors between the reconstructed data and the input data;
and taking a decoder in the trained variation self-encoder as an image generator to realize image generation.
In a specific implementation, the output of the encoder includes a mean and variance distribution parameter of the input data, and the combination of the random sampling data in the gaussian mixture distribution and the distribution parameter of the output of the encoder is specifically expressed as follows:
wherein,for mean vector, ++>For variance vector>Is a data vector randomly sampled from a gaussian mixture distribution.
The encoder adopts a structure of a full-connection layer, a convolution layer and a full-connection layer which are sequentially connected, and the last layer of the encoder is connected with a clustering device in parallel, so that the last layer of characteristic information of the encoder is clustered through the clustering device, and in the embodiment, the clustering device adopts a DBSCAN clustering device.
In a specific implementation, the classifying the marked image data as the input of the classifying neural network model and the training of the variable self-encoder by taking the unmarked image data as the input of the variable self-encoder perform data enhancement on the marked image data and the unmarked image data in advance, wherein the data enhancement operation comprises random clipping, horizontal flipping, vertical flipping, random rotation, brightness change and noise addition.
In an implementation, the classification neural network model employs a convolutional neural network.
The classification neural network adopts a ReLU activation function and a cross entropy loss function, wherein the cross entropy loss function is specifically expressed as follows:
wherein m is the number of categories;as a sign function, if the true class of the sample i is equal to c, 1 is obtained, otherwise, 0 is obtained; />The prediction probability of the observation sample i belonging to the category c is shown, and N is the scale of the classified neural network.
In particular, for easy understanding, the following detailed description of the embodiments will be given in terms of specific implementations with reference to the accompanying drawings:
to avoid the influence of artificial distribution assumptions on the model, a learner for sampling data distribution, i.e., a priori probability distribution sampling branches as shown in fig. 2, is constructed. The branch uses a mixed density network combining a classified neural network and a K-Gaussian mixture model, wherein the number of hidden layers and the number of neurons per layer can be defined by itself. The classification neural network may use a convolutional neural network, which is mainly used for classifying input data and sending information of the layer into a K-gaussian mixture model. The K-Gaussian mixture model can be regarded as a model composed of K unitary Gaussian models, and the K submodels are hidden variables of the mixture model. The main function of the K-Gaussian mixture model is to determine parameters of K-Gaussian mixture distribution by calculating hidden variables, and the model output is K Gaussian distribution, which contains three groups of information, namely a mixture coefficient, a distribution mean value and a distribution variance. The output of this branch is taken as a parameter of the sampling distribution of the random vector in the next branch re-parameterization step.
The other is the image generation branch shown in fig. 2, which is a modified VAE model where a modified concrete pointer re-parameterizes the model training process, uses the output parameters of the previous branch to build a gaussian distribution, and samples the desired random vector from the distribution. Category supervision information is introduced in the encoder learning process, and the robustness of feature learning is improved by combining a clustering method. The original input data is input into the model after the data enhancement processing is carried out on the original input data, and the decoder is enabled to reconstruct the original data, so that the model can learn the capability of eliminating noise and extracting important features, and the robustness of the model is improved.
The main technical concept of the scheme in this embodiment is as follows:
firstly, in a sampling branch of the prior probability distribution, the marked image is subjected to data enhancement processing and then is used as input of a convolutional neural network, the network utilizes known image labels to carry out classification learning, and meanwhile, the learned information is sent into a K-Gaussian mixture model to learn the distribution of the data and output distribution parameters.
Next, the unlabeled image data is input to the image generation branch. In the branch, the original data enters an encoder after data enhancement processing, and the encoder outputs learned data distribution parameters, namely mean and variance. And establishing Gaussian distribution by using the distribution parameters output by the last branch, randomly sampling the Gaussian distribution, and generating new data by combining the mean value and the variance output by the encoder and then sending the new data to a decoder.
Finally, the model is trained until the data output by the decoder is close to the original data and the distribution difference between the new data distribution and the original data is minimized. After training, the decoder of the VAE model can be extracted as an image generator.
The image generation method based on the prior probability distribution as shown in fig. 1 is a basic flow chart, and comprises the following steps:
inputting the marked data into a priori probability distribution sampling branch;
the prior probability distribution sampling branch outputs a distribution parameter mu;
inputting unlabeled data into an image generation branch;
establishing Gaussian mixture distribution by using a distribution parameter mu, and randomly sampling a group of values e from the Gaussian mixture distribution;
combining the output of the encoder with the sampling value e of the last step and then sending the combined result to a decoder;
after model training is completed, the decoder is the image generator.
As shown in fig. 2, the image generating method based on the prior probability distribution in this embodiment specifically includes the following steps:
step S01: in the branches of the prior probability distribution sampling, the marked image is subjected to data enhancement processing and then is used as input of a convolutional neural network, and the convolutional neural network consists of a convolutional layer, a convergence layer and a full connection layer. The data enhancement operation can obtain more sample data, and the main data enhancement modes include random clipping, horizontal overturning, vertical overturning, random rotation, brightness change, noise addition and the like.
In the convolution layer, the firstlInput z of layer (l) Is the firstl-Activity value of layer 1a (l-1) Convolution kernelw (l) ∈R K Is a roll of (2)The product is:
wherein the convolution kernelIs a learnable weight vector, +.>Is a learnable bias;
the convolutional layer is followed by a convergence layer using the maximum convergence approach, i.e., for a regionThe maximum activity value of all neurons in this region is chosen as a representation of this region:
wherein,is area->Activity value of each neuron in the matrix.
The last layer is a classification layer, using a fully connected neural network as the output of the classification neural network.
To bring non-linear capabilities to the neural network, a ReLU activation function is introduced, namely:
the network uses a cross entropy loss function, namely:
wherein,m is the number of categories;as a sign function, if the true class of the sample i is equal to c, 1 is obtained, otherwise, 0 is obtained; />The prediction probability of the observation sample i belonging to the category c is set, and the scale N of the convolutional neural network can be set according to actual conditions.
Step S02: and (3) sending the network output into the K-Gaussian mixture model after the neural network in the step S01 converges or meets the termination condition. Since the Gaussian distribution has good mathematical properties and good calculation performance, a Gaussian mixture model is used to select a kernel function in a Gaussian form, namely
Where t is the input of the neural network, i.e. the output of the classification layer in step S01, n is the dimension of t, i.e. the number of classes,and->Is a parameter to be learned by the neural network and has the same dimension as t, i representing the ith gaussian kernel.
The probability density of the target data can be expressed as a linear combination of multiple kernel functions, i.e
Wherein,is a mixed coefficient and satisfies the constraint +.>=1, m represents the number of gaussian kernel functions set in the neural network。
The error function of the model is as follows:
wherein,representing the loss of each sample, q represents the sequence number of the sample. Finally, the model outputs distribution parameters +.>And->Wherein->Is an n-dimensional mean vector, ">Is the covariance matrix. So far, the prior probability distribution sampling branch task is ended.
Step S03: in the image generation branch, first, data enhancement processing is performed on original image data, denoted as x. Inputting x into an encoder for encoding, introducing category supervision information, and enhancing the data encoding capacity by combining a clustering method, thereby improving the robustness of the model. The DBSCAN clustering is used for describing, a DBSCAN clustering device is connected in parallel to the first full-connection layer of the encoder, the clustering device is used for clustering the characteristic information of the first layer of the encoder, and the aim is that the number of categories after clustering accords with or is close to the real number of categories of the original image data. And after the expected or certain conditions are met, the next step is carried out.
The encoder adopts the architecture of full connection layer, convolution layer and full connection layer, and can set the number of network layers and the number of neurons of each layer according to actual conditions, and the first layer of spiritThe number of warp elements is equal to the original dataIs a dimension of (c).
In the fully-connected layer, according to the firstl-Activity value of layer 1 neuronsa (l-1) Calculate the firstlNet activity value z of layer neurons (l) Then go through an activation function to obtain the firstlThe activity value of the layer neurons,
wherein,is the firstl-Layer 1 to layer 1lWeight matrix of layer,/>Is the firstl-Layer 1 to layer 1lBias of layer->Is the firstl-output of layer 1 neurons, +.>
In the convolution layer, the firstlInput z of layer (l) Is the firstl-Activity value of layer 1a (l-1) Convolution kernelIs (are) convolved, i.e
Wherein the convolution kernelIs a learnable weight vector, +.>For a learnable bias->Representing a convolution.
The convolutional layer is followed by a convergence layer using the maximum convergence approach, i.e., for a regionThe maximum activity value of all neurons in this region is selected as a representation of this region,
wherein,is area->Activity value of each neuron in the matrix.
To bring non-linear capabilities to the neural network, a ReLU activation function is introduced, i.e
After the first fully-connected neural network of the encoder is calculated, the output data can reach the DBSCAN cluster and the next neural network of the encoder at the same time. The function of the DBSCAN cluster is implemented by calling a pre-written program. Input data d= { X of DBSCAN clustering procedure 1 ,X 2 ,…,X n The output of the first fully-connected neural network of the encoder, n is the number of neurons, distance radius r=0.5, center density threshold mints=50. The DBSCAN clustering procedure is as follows:
first, randomly selecting one data without classification mark to calculate and input data { X }, and 1 ,X 2 ,…,X n the Euclidean distance between the two, if the data without the classification mark cannot be found, the program is ended;
in the second step, if the distance is 0.5 or less, the data { X } 1 ,X 2 ,…,X n Put into a neighborhood set P i If the distance is greater than 0.5, these data { X }, are used 1 ,X 2 ,…,X m Put into another neighborhood set P i+1 In (a) and (b);
and thirdly, counting the data quantity of the neighborhood set P in the previous step, if the data quantity reaches or exceeds 50, marking the neighborhood set P and the data therein as a category, and if the data quantity of the neighborhood set P is less than 50, marking the data therein as noise points, and then turning to the first step.
Finally, the total number C of DBSCAN clustering program output categories pred
The DBSCAN clustering procedure pseudo-code is as follows (the symbols in the algorithm are not associated with context):
input: data set d= { X 1 ,X 2 ,…,X n Distance radius r=0.5, center density threshold mints=50;
and (3) outputting: cluster division result { category 1, category 2, … };
algorithm:
step 1: initializing a cluster index k=0, setting an access flag visited (Xi) =0 for an element in the data set D, 0 indicating no access;
step 2: selecting an element Xp in the dataset D, turning to step 3 if visual (Xp) =0, otherwise repeating the operation;
step 3: and calculating the distance D between Xp and other elements in D, finding a neighborhood set P meeting D < = r, and if |P| > = mints, turning to step 4. If P is an empty set, the marker Xp is a noise point and go to step 2. If |P| < mints, go to step 2;
step 4: updating the cluster mark k=k+1, setting the cluster mark k (Xp) of Xp, setting the access mark visual (Xp) =1, deleting Xp from P, and turning to step 5;
step 5: if P is not null, one of the elements Xs is selected, and the process goes to step 6. If P is an empty set, turning to step 2;
step 6: set cluster flag k (Xs) for Xs, delete Xs from P if activated (Xs) =1, go to step 5. If the visited (Xs) =0, calculating the distance D between the Xs and other elements in the D, finding a neighborhood set S meeting D < =r, adding the elements in the S into the set P, deleting the Xs from the set P, and setting an access mark visited (Xs) =1;
step 7: for all elements in data set D, found (Xi) =1, the algorithm ends and the result is output.
Step S04: the following is a re-parameterization process, the encoder outputs two sets of data, one set of dataWherein->For mean vector, ++>For the variance vector, d represents the dimension of the vector and is also the number of neurons in this layer. Here, the distribution parameters of the model outputs in step S02 are used +.>To establish a Gaussian mixture distribution, i.e
Wherein,is an n-dimensional mean vector, n represents the number of classes in the prior probability distribution sampling branch, +.>Is covariance matrix, < >>Is->Is a determinant of (2).
From Gaussian mixture distributionRandom fetch vector->And->The following formula combination is used:
where d represents the dimension of the vector,the representation is->To ensure that the variance vector is constant. Then, add->And fed into a decoder.
Step S05: the decoder serving to reconstruct the data, i.e. handleRestructuring back->And let->And original data->The difference between them is minimal. The decoder adopts a fully-connected neural network, the number of network layers and the number of neurons of each layer can be set according to actual conditions, and the number of neurons of the last layer is equal to the original data ∈ ->Is a dimension of (c).
The learning objective is to minimize the reconstruction error, even though L minimizes:
an encoder:
a decoder:
for inputThe activity value of the intermediate concealment layer of the encoder is +.>Is encoded by (a), namely:
for decoders, the output is reconstructed data, i.e
The encoder calculates the mean value of each input data while reducing the high-dimensional data to the low-dimensional data from the hidden variable space samplingSum of variances->Finally, new data are generated by means of a decoder>. We want the sampled data z from the hidden variable space to follow the original data +.>So that new data we generate from the sampled data z +.>The probability distribution of the original data can be followed.
To minimize reconstruction errors, the MSE loss function is selected:
wherein,is the original data +.>Is the reconstructed data of the decoder, m represents the data amount.
In order to reconstruct dataAnd original data->Is minimized, introducing KL divergence:
KL loss function:
wherein,is the original data +.>Probability distribution of->Reconstruction data +.>Probability distribution of->Is mean value (I)>Is the variance and n represents the amount of data.
Clustering the loss function:
wherein,is the total number of true categories, C pred Is the total number of categories output by the DBSCAN clustering routine.
Finally, the loss function of the model:
wherein,,/>,/>default value for combination coefficients>.
Step S06: after the network training is completed, the decoder is extracted as an image generator. For example, the vector is sampled from a standard normal distributionD is the vector dimension, also the number of neurons in the first layer of the decoderThe amount is then fed to a decoder, the output of which is the generated image.
Embodiment two:
it is an object of the present embodiment to provide an image generation system based on an a priori probability distribution.
An image generation system based on an a priori probability distribution, comprising:
a data distribution sampling unit for classifying the tagged image data as input to a classified neural network model; fitting the classification result of the neural network model through the Gaussian mixture model to obtain corresponding Gaussian mixture distribution;
a variation self-encoder training unit for training the variation self-encoder with the label-free image data as an input of the variation self-encoder, obtaining a distribution parameter of the input data based on an encoder in the variation self-encoder; combining random sampling data in Gaussian mixture distribution and distribution parameters output by an encoder, taking the combined data as the input of a decoder in the encoder, and obtaining reconstruction data of the input data through the decoder; wherein the training of the variation self-encoder aims at minimizing errors between the reconstructed data and the input data;
and the image generation unit is used for realizing image generation by taking a decoder in the trained variation self-encoder as an image generator.
Further, the system in this embodiment corresponds to the method in the first embodiment, and the technical details thereof have been described in the first embodiment, so that the description thereof is omitted herein.
Embodiment III:
an object of the present embodiment is to provide a computer-readable storage medium.
A computer-readable storage medium having stored thereon a program which, when executed by a processor, implements an image generation method based on a priori probability distribution as described above. Embodiment four:
an object of the present embodiment is to provide an electronic apparatus.
An electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, the processor implementing a prior probability distribution based image generation method as described above when executing the program.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. An image generation method based on prior probability distribution, comprising:
classifying the marked image data as input to a classified neural network model;
fitting the classification result of the neural network model through the Gaussian mixture model to obtain corresponding Gaussian mixture distribution;
training a variation self-encoder by taking the unmarked image data as an input of the variation self-encoder, and acquiring distribution parameters of the input data based on the encoder in the variation self-encoder; combining random sampling data in Gaussian mixture distribution and distribution parameters output by an encoder, taking the combined data as the input of a decoder in the encoder, and obtaining reconstruction data of the input data through the decoder; wherein the training of the variation self-encoder aims at minimizing errors between the reconstructed data and the input data;
and taking a decoder in the trained variation self-encoder as an image generator to realize image generation.
2. The image generating method according to claim 1, wherein the output of the encoder includes a mean and variance distribution parameter of the input data, and the combining of the random sampling data in the gaussian mixture distribution and the distribution parameter of the output of the encoder is specifically shown as follows:
wherein,for mean vector, ++>For variance vector>Is a data vector randomly sampled from a gaussian mixture distribution.
3. The image generation method based on prior probability distribution as claimed in claim 1, wherein the encoder adopts a structure of a full-connection layer, a convolution layer and a full-connection layer which are sequentially connected, and a final layer of the encoder is connected in parallel with a clustering device, and the final layer of the encoder is clustered through the clustering device.
4. A method of image generation based on a priori probability distribution as claimed in claim 3, wherein said clustering means employs a DBSCAN clustering means.
5. The image generation method based on an a priori probability distribution according to claim 1, wherein the classifying of the marked image data as an input of the classifying neural network model and the data enhancement of the marked image data and the unmarked image data in advance from the training of the encoder with the unmarked image data as an input of the variational self-encoder, wherein the data enhancement operation includes random clipping, horizontal inversion, vertical inversion, random rotation, changing brightness, and adding noise.
6. The image generation method based on prior probability distribution as recited in claim 1, wherein the classification neural network employs a ReLU activation function and a cross entropy loss function, wherein the cross entropy loss function is specifically expressed as follows:
wherein m is the number of categories;as a sign function, if the true class of the sample i is equal to c, 1 is obtained, otherwise, 0 is obtained; />The prediction probability of the observation sample i belonging to the category c is shown, and N is the scale of the classified neural network.
7. The image generation method based on prior probability distribution according to claim 1, wherein the classification neural network model adopts a convolutional neural network.
8. An image generation system based on a priori probability distribution, comprising:
a data distribution sampling unit for classifying the tagged image data as input to a classified neural network model; fitting the classification result of the neural network model through the Gaussian mixture model to obtain corresponding Gaussian mixture distribution;
a variation self-encoder training unit for training the variation self-encoder with the label-free image data as an input of the variation self-encoder, obtaining a distribution parameter of the input data based on an encoder in the variation self-encoder; combining random sampling data in Gaussian mixture distribution and distribution parameters output by an encoder, taking the combined data as the input of a decoder in the encoder, and obtaining reconstruction data of the input data through the decoder; wherein the training of the variation self-encoder aims at minimizing errors between the reconstructed data and the input data;
and the image generation unit is used for realizing image generation by taking a decoder in the trained variation self-encoder as an image generator.
9. A computer-readable storage medium, on which a program is stored, which program, when being executed by a processor, implements a prior probability distribution based image generation method as claimed in any one of claims 1 to 7.
10. An electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, wherein the processor implements a prior probability distribution based image generation method according to any one of claims 1-7 when executing the program.
CN202311210822.8A 2023-09-20 2023-09-20 Image generation method, system, medium and device based on prior probability distribution Active CN116958712B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311210822.8A CN116958712B (en) 2023-09-20 2023-09-20 Image generation method, system, medium and device based on prior probability distribution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311210822.8A CN116958712B (en) 2023-09-20 2023-09-20 Image generation method, system, medium and device based on prior probability distribution

Publications (2)

Publication Number Publication Date
CN116958712A CN116958712A (en) 2023-10-27
CN116958712B true CN116958712B (en) 2023-12-15

Family

ID=88460515

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311210822.8A Active CN116958712B (en) 2023-09-20 2023-09-20 Image generation method, system, medium and device based on prior probability distribution

Country Status (1)

Country Link
CN (1) CN116958712B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117173543B (en) * 2023-11-02 2024-02-02 天津大学 Mixed image reconstruction method and system for lung adenocarcinoma and pulmonary tuberculosis

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886388A (en) * 2019-01-09 2019-06-14 平安科技(深圳)有限公司 A kind of training sample data extending method and device based on variation self-encoding encoder
CN111243045A (en) * 2020-01-10 2020-06-05 杭州电子科技大学 Image generation method based on Gaussian mixture model prior variation self-encoder
CN111258992A (en) * 2020-01-09 2020-06-09 电子科技大学 Seismic data expansion method based on variational self-encoder
CN113255830A (en) * 2021-06-21 2021-08-13 上海交通大学 Unsupervised target detection method and system based on variational self-encoder and Gaussian mixture model
CN114417852A (en) * 2021-12-06 2022-04-29 重庆邮电大学 Topic modeling method based on Wasserstein self-encoder and Gaussian mixture distribution as prior
CN115331012A (en) * 2022-10-14 2022-11-11 山东建筑大学 Joint generation type image instance segmentation method and system based on zero sample learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11545255B2 (en) * 2019-12-20 2023-01-03 Shanghai United Imaging Intelligence Co., Ltd. Systems and methods for classifying an anomaly medical image using variational autoencoder
US11720994B2 (en) * 2021-05-14 2023-08-08 Lemon Inc. High-resolution portrait stylization frameworks using a hierarchical variational encoder

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886388A (en) * 2019-01-09 2019-06-14 平安科技(深圳)有限公司 A kind of training sample data extending method and device based on variation self-encoding encoder
CN111258992A (en) * 2020-01-09 2020-06-09 电子科技大学 Seismic data expansion method based on variational self-encoder
CN111243045A (en) * 2020-01-10 2020-06-05 杭州电子科技大学 Image generation method based on Gaussian mixture model prior variation self-encoder
CN113255830A (en) * 2021-06-21 2021-08-13 上海交通大学 Unsupervised target detection method and system based on variational self-encoder and Gaussian mixture model
CN114417852A (en) * 2021-12-06 2022-04-29 重庆邮电大学 Topic modeling method based on Wasserstein self-encoder and Gaussian mixture distribution as prior
CN115331012A (en) * 2022-10-14 2022-11-11 山东建筑大学 Joint generation type image instance segmentation method and system based on zero sample learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Gaussian Mixture Variational Autoencoder for Semi-Supervised Topic Modeling;Zhou, CQ等;IEEE ACCESS;第8卷;第106843-106854页 *
基于变分自编码器的图像生成算法研究;周家洛;中国优秀硕士学位论文全文数据库;全文 *
基于条件生成式对抗网络的数据增强方法;陈文兵;管正雄;陈允杰;;计算机应用(11);全文 *

Also Published As

Publication number Publication date
CN116958712A (en) 2023-10-27

Similar Documents

Publication Publication Date Title
CN110378334B (en) Natural scene text recognition method based on two-dimensional feature attention mechanism
CN111754596B (en) Editing model generation method, device, equipment and medium for editing face image
Xu et al. Adversarially approximated autoencoder for image generation and manipulation
CN116958712B (en) Image generation method, system, medium and device based on prior probability distribution
CN114332578A (en) Image anomaly detection model training method, image anomaly detection method and device
CN111241291A (en) Method and device for generating countermeasure sample by utilizing countermeasure generation network
Akhtar et al. Attack to fool and explain deep networks
CN116664719B (en) Image redrawing model training method, image redrawing method and device
CN111598979A (en) Method, device and equipment for generating facial animation of virtual character and storage medium
CN111352965A (en) Training method of sequence mining model, and processing method and equipment of sequence data
CN111310852A (en) Image classification method and system
CN112017255A (en) Method for generating food image according to recipe
CN109447096A (en) A kind of pan path prediction technique and device based on machine learning
CN117292041B (en) Semantic perception multi-view three-dimensional human body reconstruction method, device and medium
CN110929532A (en) Data processing method, device, equipment and storage medium
Hao et al. Synthetic data in AI: Challenges, applications, and ethical implications
CN117788629A (en) Image generation method, device and storage medium with style personalization
CN113704393A (en) Keyword extraction method, device, equipment and medium
Schirrmeister et al. When less is more: Simplifying inputs aids neural network understanding
CN114333069B (en) Object posture processing method, device, equipment and storage medium
CN115482557A (en) Human body image generation method, system, device and storage medium
CN115565079A (en) Remote sensing image semantic segmentation method based on bilateral fusion
CN113901820A (en) Chinese triplet extraction method based on BERT model
CN117853678B (en) Method for carrying out three-dimensional materialization transformation on geospatial data based on multi-source remote sensing
CN116704588B (en) Face image replacing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant