CN111738351A

CN111738351A - Model training method and device, storage medium and electronic equipment

Info

Publication number: CN111738351A
Application number: CN202010623929.5A
Authority: CN
Inventors: 张发恩; 宋亮
Original assignee: Ainnovation Chongqing Technology Co ltd
Current assignee: Ainnovation Chongqing Technology Co ltd
Priority date: 2020-06-30
Filing date: 2020-06-30
Publication date: 2020-10-02
Anticipated expiration: 2040-06-30
Also published as: CN111738351B

Abstract

The application relates to the technical field of image clustering, and provides a model training method, a model training device, a storage medium and electronic equipment. The model training method comprises the following steps: acquiring a training image and inputting the training image into an encoder to obtain the distribution parameters of the characteristic vectors output by the encoder; determining the category of the training image according to the distribution parameters of the feature vectors; sampling from the distribution of the classes of the training images to obtain sampling vectors; inputting the sampling vector to a decoder to obtain a reconstructed image output by the decoder; respectively inputting the training image and the reconstructed image into a discriminator to obtain a discrimination result output by the discriminator; and repeating the steps from obtaining the training image to obtaining the judgment result so as to train the image clustering model comprising the encoder, the decoder and the discriminator. According to the method, the discriminator is arranged, and the confrontation training is carried out on the model, so that the encoder can effectively extract the image characteristics, and therefore a better effect can be achieved by subsequently utilizing the trained encoder to execute the image clustering task.

Description

Model training method and device, storage medium and electronic equipment

Technical Field

The invention relates to the technical field of image clustering, in particular to a model training method, a model training device, a storage medium and electronic equipment.

Background

The process of dividing a collection of objects into multiple categories consisting of similar objects is called clustering. The current unsupervised clustering method mainly utilizes the features of the extracted objects to perform clustering, but for some unstructured data, such as images, better features cannot be easily extracted, resulting in poor clustering effect.

Disclosure of Invention

An object of the embodiments of the present application is to provide a model training method, apparatus, storage medium, and electronic device, so as to solve the above technical problems.

In order to achieve the above purpose, the present application provides the following technical solutions:

in a first aspect, an embodiment of the present application provides a model training method, including: acquiring a training image and inputting the training image into an encoder to obtain distribution parameters of a feature vector output by the encoder; determining the category of the training image according to the distribution parameters of the feature vectors; sampling from the distribution of classes of the training image to obtain a sampling vector; inputting the sampling vector to a decoder to obtain a reconstructed image output by the decoder; respectively inputting the training image and the reconstructed image into a discriminator to obtain a discrimination result output by the discriminator, wherein the discrimination result comprises the authenticity and the classification identity of the input image; repeating the steps from obtaining the training image to obtaining the discrimination result so as to train an image clustering model comprising the encoder, the decoder and the discriminator; the training mode is countertraining, and the training target comprises the authenticity and the category of the training image and the reconstructed image which cannot be distinguished according to the judgment result output by the discriminator.

The method is an unsupervised clustering method based on a variational self-encoder, in the method, through carrying out countertraining on an image clustering model comprising an encoder, a decoder and a discriminator, after the training is finished, the discriminator is difficult to distinguish a reconstructed image output by the decoder from a real training image, and the image reconstruction of the decoder is carried out based on the distribution parameters of the characteristic vectors extracted by the encoder, which shows that the encoder can effectively extract the characteristics of the images, so that a better effect can be obtained by utilizing the trained encoder to execute an image clustering task subsequently.

In addition, in the method, a discriminator (for example, a neural network) is arranged to evaluate the discrimination of the training image and the reconstructed image, instead of performing loss calculation by adopting an image difference mode (for example, obtaining the L2 distance between the training image and the reconstructed image), so that the problems of difficult loss convergence and difficult model training are avoided.

In an implementation manner of the first aspect, the determining the category of the training image according to the distribution parameter of the feature vector includes: calculating the distance between the distribution of the feature vectors and the distribution of each existing category according to the distribution parameters of the feature vectors; determining the existing class with the minimum distance obtained by calculation as the class of the training image; and updating the distribution of the existing category by using the distribution parameters of the feature vectors.

In the above implementation, the number of the clusters is fixed, and the current training image is necessarily classified into a certain existing class by calculating the distance between the distribution of the feature vector and the distribution of each existing class. The existing classes in this implementation may be several classes that are preset before the training begins.

In an implementation manner of the first aspect, the determining the category of the training image according to the distribution parameter of the feature vector includes: calculating the distance between the distribution of the feature vectors and the distribution of each existing category according to the distribution parameters of the feature vectors; judging whether the distance smaller than a preset threshold value exists in the calculated distances or not; if the distance smaller than the preset threshold exists, determining the existing category corresponding to the minimum value in the distance smaller than the preset threshold as the category of the training image, and updating the distribution of the existing category by using the distribution parameters of the feature vectors; and if the distance smaller than the preset threshold does not exist, distributing a new category for the training image, and determining the distribution of the new category according to the distribution parameters of the feature vectors.

In the above implementation, the number of the classes of the cluster is not fixed, and not only the distance between the distribution of the feature vectors and the distribution of each existing class is calculated, but also the size relationship between the calculated distance and the preset threshold is considered, and according to the size relationship, the current training image may be classified into a certain class, but a new class may also be created for the current training image. In this way, before the training starts, several categories may be preset as existing categories, or no categories may be preset.

In an implementation manner of the first aspect, the calculating, according to the distribution parameter of the feature vector, a distance between the distribution of the feature vector and a distribution of each existing category includes: calculating the distance between the distribution parameter of the feature vector and the distribution parameter of each existing category as the distance between the distribution of the feature vector and the distribution of each existing category; or determining the distribution of the feature vectors according to the distribution parameters of the feature vectors, and calculating the KL divergence between the distribution of the feature vectors and the distribution of each existing category as the distance between the distribution of the feature vectors and the distribution of each existing category.

In the above implementation, two methods of calculating the distance between the distribution of the feature vectors and the distribution of each existing class are provided: firstly, the distance between the distribution parameters is calculated (because the distribution parameters can also adopt a vector form, the distance between the vectors is calculated); secondly, calculating KL divergence, which is also called relative entropy, and evaluating the difference degree between the two distributions. Of course, other calculation methods are not excluded.

In one implementation of the first aspect, the distribution parameters include a mean and a variance.

The probability density function of some distributions can be completely determined from the mean and variance, for example, gaussian distributions. In an alternative, the distribution of the feature vectors, the distribution of each class, may be assumed to follow a gaussian distribution.

In one implementation form of the first aspect, the encoder, the decoder, and the discriminator each employ a neural network.

The neural network has good learning and generalization capability, and can extract deep-level features of the image and be used for different purposes. For example, for two images with substantially the same content, one of the two images is slightly shifted, if the degree of discrimination between the two images is evaluated by using the L2 distance, since the L2 distance merely represents the difference of the images at the pixel value level, the image shift may cause the calculated L2 distance to be large, thereby causing the evaluation result to be misaligned; however, for the discriminator implemented by the neural network, discrimination evaluation is performed based on the deep-level features of the image (specific contents of the representation image), and the obtained evaluation result is more accurate because the image translation basically does not change the contents of the image.

In one implementation form of the first aspect, the method further comprises: and determining the category of the image to be processed by utilizing an encoder in the trained image clustering model.

In a second aspect, an embodiment of the present application provides a model training apparatus, including: the encoding module is used for acquiring a training image and inputting the training image into an encoder to obtain the distribution parameters of the characteristic vectors output by the encoder; the clustering module is used for determining the category of the training image according to the distribution parameters of the feature vectors; the sampling module is used for sampling from the distribution of the classes of the training images to obtain sampling vectors; the decoding module is used for inputting the sampling vector to a decoder to obtain a reconstructed image output by the decoder; the judging module is used for respectively inputting the training image and the reconstructed image into a discriminator to obtain a judging result output by the discriminator, and the judging result comprises the authenticity and the classification identity of the input image; the iteration module is used for repeatedly acquiring training images to obtain a judgment result so as to train an image clustering model comprising the encoder, the decoder and the discriminator; the training mode is countertraining, and the training target comprises the authenticity and the category of the training image and the reconstructed image which cannot be distinguished according to the judgment result output by the discriminator.

In a third aspect, an embodiment of the present application provides a computer-readable storage medium, where computer program instructions are stored on the computer-readable storage medium, and when the computer program instructions are read and executed by a processor, the computer program instructions perform the method provided by the first aspect or any one of the possible implementation manners of the first aspect.

In a fourth aspect, an embodiment of the present application provides an electronic device, including: a memory in which computer program instructions are stored, and a processor, where the computer program instructions are read and executed by the processor to perform the method provided by the first aspect or any one of the possible implementation manners of the first aspect.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

FIG. 1 is a schematic diagram of a construction of a variational auto-encoder;

FIG. 2 is a schematic diagram illustrating an operation of a model training method according to an embodiment of the present disclosure;

FIG. 3 is a flow chart illustrating a model training method provided by an embodiment of the present application;

FIG. 4 is a functional block diagram of a model training apparatus provided in an embodiment of the present application;

fig. 5 shows a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The model training method provided by the embodiment of the application is an unsupervised clustering method (a trained model can be used for unsupervised clustering images) based on a variational auto-Encoder (VAE for short). Therefore, before the scheme of the present application is introduced, the concept of the variational auto-encoder is briefly introduced, and for the implementation details of the variational auto-encoder, if not mentioned, reference may be made to the prior art.

Fig. 1 shows a schematic structural diagram of a variational self-encoder. Referring to fig. 1, the work flow of the variational self-encoder mainly includes three stages:

and (3) an encoding stage: the input data is encoded by the encoder to obtain the mean of the hidden variables and the variance of the hidden variables (meaning the mean and variance of the distribution to which the hidden variables obey).

A sampling stage: and sampling from the distribution obeyed by the implicit variable based on the mean value and the variance obtained in the encoding stage to obtain a sampling variable.

And a decoding stage: and decoding output data consistent with the input data dimension based on the sampling variable.

The variational self-encoder can be trained to enable output data to be as close to input data as possible, namely reconstruction of the input data is achieved. Thus, in some application scenarios, trained variators may be used from the encoder for data generation. In particular, in the field of image processing, variabilities may be used from the encoder for image generation.

Fig. 2 is a schematic diagram illustrating a model training method provided in an embodiment of the present application, which is used for training an image clustering model that can perform an image clustering task (in practice, only an encoder may be used for performing the image clustering task, and the rest of the model is only used for training, which is described in detail below). Referring to fig. 2, it can be seen that the image clustering model also includes a variational auto-encoder composed of an encoder and a decoder, and the main difference between the model and the variational auto-encoder is that a discriminator is added. Other matters in fig. 2 will be described together with fig. 3 in the description.

Fig. 3 shows a flowchart of a model training method provided in an embodiment of the present application. The method may be performed by an electronic device, and fig. 5 shows a possible structure of the electronic device, which may be referred to as the following description with reference to fig. 5. Referring to fig. 3, the method includes:

step S100: and acquiring a training image, inputting the training image into an encoder, and acquiring the distribution parameters of the feature vectors output by the encoder.

The method for acquiring the training image is not limited, and may be, for example, an unlabeled image collected from a network, an image in a data set, an image acquired in real time, or the like. The encoder is used for encoding the training image and outputting the distribution parameters of the characteristic vectors. The feature vector herein corresponds to the hidden variable mentioned above when describing the variational Auto-Encoder, in which the Encoder directly extracts the feature vector of the input image, but in the case of the variational Auto-Encoder, the feature vector of the training image is not directly extracted here, but parameters of the distribution to which the feature vector is subjected are output, so that the feature vector is implicit in this sense.

The encoder may be implemented using a neural network, which may contain several convolutional layers (although other layers, such as pooling layers, are not excluded) for encoding an image into distribution parameters of feature vectors by feature extraction. For example, the encoder may employ a network structure such as ResNet, VGG, LeNet, GoogleNet, and the like.

In step S100, which distribution parameters need to be output depend on the distribution form of the feature vector, which needs to be determined in advance, and conversely, if the specific values of the distribution parameters of the feature vector are known, the specific distribution can also be determined.

For example, if the feature vector is assumed to follow a gaussian distribution, the distribution parameters may include a mean and a variance, from which a probability density function of a particular gaussian distribution can be uniquely determined; for another example, if the feature vector is assumed to obey an exponential distribution, the distribution parameters may include a rate parameter from which a probability density function of a particular exponential distribution can be uniquely determined.

Step S110: and determining the class of the training image according to the distribution parameters of the feature vectors.

The cluster classes determined when step S110 is performed are referred to as existing classes, each existing class corresponds to a probability distribution of its own, each existing class includes zero or more training images, and the training images may be regarded as samples obtained by sampling the distribution of the class. Note that even if any training image is not included under a certain category, the category may have a probability distribution, such distribution being pre-specified. The distribution form of the clustering classes needs to be determined in advance, so that each distribution needs to be represented by which distribution parameters and can also be determined, and in the process of clustering the training images, the distribution of each class can be maintained only by updating the values of the distribution parameters according to the class division condition of the training images.

Thus, the step S110 of determining the class of the training image according to the distribution parameter of the feature vector may refer to calculating a distance between the distribution of the feature vector and the distribution of each existing class according to the distribution parameter of the feature vector, and then determining which existing class the current training image should belong to or not belong to any existing class according to the calculated distance. The distance here refers to a similarity measurement result, and is used to measure the similarity between two distributions, and the value of the distance and the similarity of the distributions may be a positive correlation. Two distance calculation methods are listed below, and it can be understood that there are other distance calculation methods:

the first method is as follows: and calculating the distance between the distribution parameter of the feature vector and the distribution parameter of the existing category as the distance between the distribution of the feature vector and the distribution of the existing category. Since the distribution parameters may determine the probability density function of the distribution, if the parameters of two distributions are close, the similarity is higher and the distance between them is smaller. The distribution parameters may be represented in the form of vectors (e.g., mean vectors, variance vectors, which have the same dimensions as the feature vectors), such that the distance between the calculated distribution parameters is the distance between the calculated vectors.

The second method comprises the following steps: and determining the distribution of the feature vectors according to the distribution parameters of the feature vectors, and calculating KL divergence between the distribution of the feature vectors and the distribution of the existing categories as the distance between the distribution of the feature vectors and the distribution of the existing categories. The KL divergence, also known as relative entropy, is designed to evaluate the degree of difference between two distributions, the greater the KL divergence value, the greater the difference between two fractions, and the lesser otherwise.

There are two main cases in image clustering: one is to pre-designate the categories contained in the clustering result, and only divide the images to be clustered into the preset categories in the clustering process, for example, some prior knowledge about the images to be clustered is obtained in advance, so that the categories contained in total can be determined; the other is not to specify a class in advance, and the process of clustering allows a new class to be generated, for example, it is not known in advance that a generation of clustered images may be divided into several classes. For the latter case, some preset categories may be designated as initial categories, or none of the categories may be designated as initial categories, and all the categories are generated in the clustering process.

For the first case, step S110 may be implemented as follows: firstly, calculating the distance between the distribution of the feature vectors and the distribution of each existing category according to the distribution parameters of the feature vectors; then, determining the existing class with the minimum distance obtained by calculation as the class of the training image; and finally, updating the distribution of the existing categories by using the distribution parameters of the feature vectors.

The existing category is preset, the distribution parameters of the initial distribution can be randomly given, after the training image is divided into the category, the distribution parameters of the initial distribution are updated by using the distribution parameters of the corresponding feature vectors, and the updated distribution parameters have practical significance. In this implementation, the current training image must be classified into some existing category.

For the second case, step S110 may be implemented as follows: firstly, calculating the distance between the distribution of the feature vectors and the distribution of each existing category according to the distribution parameters of the feature vectors; then, judging whether the distance smaller than a preset threshold value exists in the calculated distances or not; if the distance smaller than the preset threshold exists, determining the existing category corresponding to the minimum value in the distance smaller than the preset threshold as the category of the training image, and updating the distribution of the existing category by using the distribution parameters of the feature vectors; and if the distance smaller than the preset threshold does not exist, a new category is allocated to the training image, and the distribution of the new category is determined according to the distribution parameters of the feature vectors.

In the above implementation, not only the distance between the distribution of the feature vectors and the distribution of each existing category is calculated, but also the magnitude relationship between the calculated distance and the preset threshold is considered, and according to the magnitude relationship, the current training image may be classified into a certain category, but a new category may also be created for the current training image. In this way, before the training begins, several categories may be preset as existing categories, or no categories may be preset, and the categories are optionally generated during the clustering process.

Certainly, other image clustering manners are not excluded except for the above two cases, for example, no preset category is designated for image clustering, but a maximum category number threshold is designated, in the clustering process, if the number of the existing categories does not reach the threshold, a new category is allowed to be generated, if the number of the existing categories reaches the threshold, the new category is not allowed to be generated, and only the images are allowed to be divided into the existing categories.

Step S120: a sample vector is obtained from the sampling from the distribution of classes of the training image.

Since the class of the training image has been determined in step S110, only the distribution of the class needs to be sampled to obtain a sampling vector, where the sampling vector corresponds to the sampling variable mentioned above in the introduction of the variational auto-encoder. Sampling based on a known distribution is well known in the art and is not described in detail herein, and the vector generated by sampling may be a random vector.

Step S130: and inputting the sampling vector into a decoder to obtain a reconstructed image output by the decoder.

The decoder may be implemented using a neural network which may contain several deconvolution layers (although other layers are not excluded, such as inverse pooling layers) for decoding (or reconstructing) a vector into an image.

Step S140: and respectively inputting the training image and the reconstructed image into a discriminator to obtain a discrimination result output by the discriminator, wherein the discrimination result comprises the authenticity and the class identity of the input image.

The authenticity of the input image (which may correspond to a score or probability) is whether the input image is the original training image or the reconstructed training image, and the similarity of the input image (which may correspond to a score or probability) is whether the input training image and the reconstructed image belong to the same category.

The discriminator may be implemented using a neural network, although it is not excluded that the discrimination is performed using some inherent rule, such as calculating the similarity of the training images and the reconstructed images. However, because the neural network has good learning and generalization capabilities, the output of the discrimination result after training is more reliable than the discrimination result calculated by only relying on the preset rule.

Step S150: and carrying out countertraining on the image clustering model until a training end condition is met.

The image clustering model includes at least an encoder, a decoder, and a discriminator, but does not exclude other structures. The term "confrontational training" refers to the confrontation of training targets between two parts, where the codec is considered as one part and the discriminator is considered as the other part. The goal of the codec's training is to make the decoder reconstruct as much as possible the exact same image as the training image, enough to "fool" the discriminator; the training target of the discriminator is to distinguish the reconstructed image and the training image as much as possible without being deceived by the coder and the decoder. In overview, the training targets of the image clustering model include: after training is finished, the authenticity and the category of a training image and a reconstructed image cannot be distinguished according to a discrimination result output by a trained discriminator, namely, the training image or the reconstructed image is input into the discriminator, whether the training image or the reconstructed image is reconstructed or an original training image is difficult to determine according to the discrimination result, the training image and the reconstructed image are input into the discriminator, and the training image and the reconstructed image belong to the same category according to the discrimination result, namely, the image reconstructed by a decoder is close to the real training image enough.

For the principle of the countermeasure training, reference may be made to the relevant content in the generated countermeasure network (GAN), where the codec may be regarded as the generated network (Generator) in the GAN and the Discriminator may be regarded as the Discriminator (Discriminator) in the GAN.

Steps S100 to S140 may be regarded as an iteration of the training process (the steps of calculating the loss and updating the network parameters are omitted), and step S150 is an iteration step, and after each round of training, it is determined whether a training end condition is satisfied, if so, the training is ended, otherwise, the procedure returns to step S100 to start the next round of training. The training end condition may have a plurality of setting manners, for example, the training end may be after a certain round of training, the training end may be after a certain time, the training end may be after the discriminator converges, or a combination of the above conditions, and the like.

After the image clustering model is trained, the encoder therein may be used to perform an image clustering task, which is similar to the process of step S100 and step S110: acquiring images to be processed (images to be clustered) and inputting the images to be processed into a trained encoder, acquiring distribution parameters of feature vectors output by the encoder, and determining the category of the images to be processed according to the distribution parameters of the feature vectors. The specific implementation manner thereof can also refer to step S100 and step S110, which are not repeated here. As for the other components in the image clustering model, they may not be used when performing the actual image clustering task.

In summary, in the model training method provided in the embodiment of the present application, the countertraining is performed on the image clustering model, so that after the training is completed, the discriminator is difficult to distinguish the reconstructed image output by the decoder from the real training image, and since the image reconstruction performed by the decoder is performed based on the distribution parameters of the feature vector extracted by the encoder, this indicates that the encoder can effectively extract the features of the image (only if the extracted features are good enough, the discriminator is difficult to distinguish the reconstructed image from the training image), and therefore, a better effect can be obtained when the trained encoder is subsequently used to perform the image clustering task.

In some comparative embodiments, no discriminator is used, only the variational autocoder is used for clustering, and reconstruction losses (e.g., calculating the difference between the training image and the reconstructed image) are added to the loss function of the variational autocoder, so that the reconstructed image is close enough to the training image. Taking the example of calculating the reconstruction loss by using the L2 distance as an example, long-term research by the inventors finds that, since the L2 distance only represents the difference of the image at the pixel value level, some slight changes (such as translation operation) of the image at the pixel level may cause the calculated L2 distance to be large, and particularly when the image size is large, the problem is more obvious, so that the reconstruction loss is difficult to converge, and is not beneficial to the training of the model. In the scheme of the application, the discriminator is arranged to evaluate the discrimination of the training image and the reconstructed image instead of performing loss calculation in an image difference mode, so that the problems in a comparison embodiment are avoided, the image clustering model is easy to train, and the method can be used for executing a clustering task of large-size images.

Further, in some implementations of the method, the encoder, the decoder, and the discriminator in the image clustering network are implemented by using a neural network. Because the neural network has good learning and generalization capabilities, deep-level features of the image can be extracted and used for different purposes. For example, for two images with substantially the same content, one of which is only slightly shifted with respect to the other, if the degree of discrimination between the two images is evaluated by using the L2 distance, the calculated L2 distance may be large due to the effect of image shift, and the evaluation result may be misaligned; however, for the discriminator implemented by the neural network, discrimination evaluation is performed based on the deep-level features of the image (specific contents of the representation image), and the obtained evaluation result is more accurate because the image translation basically does not change the contents of the image.

Fig. 4 shows a functional block diagram of a model training apparatus 200 according to an embodiment of the present application. Referring to fig. 4, the model training apparatus 200 includes:

the encoding module 210 is configured to obtain a training image, input the training image to an encoder, and obtain a distribution parameter of a feature vector output by the encoder;

a clustering module 220, configured to determine a category of the training image according to the distribution parameter of the feature vector;

a sampling module 230, configured to sample from the distribution of the classes of the training images to obtain a sampling vector;

a decoding module 240, configured to input the sampling vector to a decoder, and obtain a reconstructed image output by the decoder;

a determining module 250, configured to input the training image and the reconstructed image to a determiner respectively, and obtain a determining result output by the determiner, where the determining result includes authenticity and classification identity of the input image;

an iteration module 260, configured to repeat the steps from obtaining a training image to obtaining a determination result, so as to train an image clustering model including the encoder, the decoder, and the determiner; the training mode is countertraining, and the training target comprises the authenticity and the category of the training image and the reconstructed image which cannot be distinguished according to the judgment result output by the discriminator.

In one implementation of the model training apparatus 200, the clustering module 220 determines the class of the training image according to the distribution parameters of the feature vectors, including: calculating the distance between the distribution of the feature vectors and the distribution of each existing category according to the distribution parameters of the feature vectors; determining the existing class with the minimum distance obtained by calculation as the class of the training image; and updating the distribution of the existing category by using the distribution parameters of the feature vectors.

In one implementation of the model training apparatus 200, the clustering module 220 determines the class of the training image according to the distribution parameters of the feature vectors, including: calculating the distance between the distribution of the feature vectors and the distribution of each existing category according to the distribution parameters of the feature vectors; judging whether the distance smaller than a preset threshold value exists in the calculated distances or not; if the distance smaller than the preset threshold exists, determining the existing category corresponding to the minimum value in the distance smaller than the preset threshold as the category of the training image, and updating the distribution of the existing category by using the distribution parameters of the feature vectors; and if the distance smaller than the preset threshold does not exist, distributing a new category for the training image, and determining the distribution of the new category according to the distribution parameters of the feature vectors.

In one implementation of the model training apparatus 200, the clustering module 220 calculates the distance between the distribution of the feature vectors and the distribution of each existing category according to the distribution parameters of the feature vectors, including: calculating the distance between the distribution parameter of the feature vector and the distribution parameter of each existing category as the distance between the distribution of the feature vector and the distribution of each existing category; or determining the distribution of the feature vectors according to the distribution parameters of the feature vectors, and calculating the KL divergence between the distribution of the feature vectors and the distribution of each existing category as the distance between the distribution of the feature vectors and the distribution of each existing category.

In one implementation of the model training apparatus 200, the distribution parameters include a mean and a variance.

In one implementation of the model training apparatus 200, the encoder, the decoder, and the discriminator all employ a neural network.

In one implementation of the model training apparatus 200, the apparatus further comprises: and the application module is used for determining the category of the image to be processed by utilizing an encoder in the trained image clustering model.

The model training apparatus 200 provided in the embodiment of the present application, the implementation principle and the technical effects thereof have been introduced in the foregoing method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiments where no part of the apparatus embodiments is mentioned.

Fig. 5 shows a possible structure of an electronic device 300 provided in an embodiment of the present application. Referring to fig. 5, the electronic device 300 includes: a processor 310, a memory 320, and a communication interface 330, which are interconnected and in communication with each other via a communication bus 340 and/or other form of connection mechanism (not shown).

The Memory 320 includes one or more (Only one is shown in the figure), which may be, but not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), and the like. The processor 310, as well as possibly other components, may access, read, and/or write data to the memory 320.

The processor 310 includes one or more (only one shown) which may be an integrated circuit chip having signal processing capabilities. The Processor 610 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Micro Control Unit (MCU), a Network Processor (NP), or other conventional processors; or a special-purpose Processor, including a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, and a discrete hardware component.

Communication interface 330 includes one or more (only one shown) that may be used to communicate directly or indirectly with other devices for the purpose of data interaction. Communication interface 330 may include an interface to communicate wired and/or wireless.

One or more computer program instructions may be stored in memory 320 and read and executed by processor 310 to implement the model training methods provided by the embodiments of the present application and other desired functionality.

It will be appreciated that the configuration shown in fig. 5 is merely illustrative and that electronic device 300 may include more or fewer components than shown in fig. 5 or have a different configuration than shown in fig. 5. The components shown in fig. 5 may be implemented in hardware, software, or a combination thereof. The electronic device 300 may be a physical device, such as a server, a PC, a laptop, a tablet, a mobile phone, a wearable device, an image capture device, an in-vehicle device, a drone, a robot, etc., or may be a virtual device, such as a virtual machine, a virtualized container, etc. The electronic device 300 is not limited to a single device, and may be a combination of a plurality of devices or one or more clusters of a large number of devices.

The embodiment of the present application further provides a computer-readable storage medium, where computer program instructions are stored on the computer-readable storage medium, and when the computer program instructions are read and executed by a processor of a computer, the computer-readable storage medium executes the model training method provided in the embodiment of the present application. The computer-readable storage medium may be implemented as, for example, memory 320 in electronic device 300 in fig. 5.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method of model training, comprising:

acquiring a training image and inputting the training image into an encoder to obtain distribution parameters of a feature vector output by the encoder;

determining the category of the training image according to the distribution parameters of the feature vectors;

sampling from the distribution of classes of the training image to obtain a sampling vector;

inputting the sampling vector to a decoder to obtain a reconstructed image output by the decoder;

respectively inputting the training image and the reconstructed image into a discriminator to obtain a discrimination result output by the discriminator, wherein the discrimination result comprises the authenticity and the classification identity of the input image;

repeating the steps from obtaining the training image to obtaining the discrimination result so as to train an image clustering model comprising the encoder, the decoder and the discriminator; the training mode is countertraining, and the training target comprises the authenticity and the category of the training image and the reconstructed image which cannot be distinguished according to the judgment result output by the discriminator.

2. The model training method according to claim 1, wherein the determining the class of the training image according to the distribution parameters of the feature vector comprises:

calculating the distance between the distribution of the feature vectors and the distribution of each existing category according to the distribution parameters of the feature vectors;

determining the existing class with the minimum distance obtained by calculation as the class of the training image;

and updating the distribution of the existing category by using the distribution parameters of the feature vectors.

3. The model training method according to claim 1, wherein the determining the class of the training image according to the distribution parameters of the feature vector comprises:

judging whether the distance smaller than a preset threshold value exists in the calculated distances or not;

if the distance smaller than the preset threshold exists, determining the existing category corresponding to the minimum value in the distance smaller than the preset threshold as the category of the training image, and updating the distribution of the existing category by using the distribution parameters of the feature vectors;

and if the distance smaller than the preset threshold does not exist, distributing a new category for the training image, and determining the distribution of the new category according to the distribution parameters of the feature vectors.

4. The model training method according to claim 2 or 3, wherein the calculating the distance between the distribution of the feature vectors and the distribution of each existing class according to the distribution parameters of the feature vectors comprises:

calculating the distance between the distribution parameter of the feature vector and the distribution parameter of each existing category as the distance between the distribution of the feature vector and the distribution of each existing category; alternatively, the first and second electrodes may be,

and determining the distribution of the feature vectors according to the distribution parameters of the feature vectors, and calculating KL divergence between the distribution of the feature vectors and the distribution of each existing category as the distance between the distribution of the feature vectors and the distribution of each existing category.

5. The model training method of claim 1, wherein the distribution parameters include a mean and a variance.

6. The model training method of claim 1, wherein the encoder, the decoder, and the discriminator each employ a neural network.

7. The model training method of claim 1, further comprising:

and determining the category of the image to be processed by utilizing an encoder in the trained image clustering model.

8. A model training apparatus, comprising:

the encoding module is used for acquiring a training image and inputting the training image into an encoder to obtain the distribution parameters of the characteristic vectors output by the encoder;

the clustering module is used for determining the category of the training image according to the distribution parameters of the feature vectors;

the sampling module is used for sampling from the distribution of the classes of the training images to obtain sampling vectors;

the decoding module is used for inputting the sampling vector to a decoder to obtain a reconstructed image output by the decoder;

the judging module is used for respectively inputting the training image and the reconstructed image into a discriminator to obtain a judging result output by the discriminator, and the judging result comprises the authenticity and the classification identity of the input image;

the iteration module is used for repeatedly acquiring training images to obtain a judgment result so as to train an image clustering model comprising the encoder, the decoder and the discriminator; the training mode is countertraining, and the training target comprises the authenticity and the category of the training image and the reconstructed image which cannot be distinguished according to the judgment result output by the discriminator.

9. A computer-readable storage medium having computer program instructions stored thereon, which when read and executed by a processor, perform the method of any one of claims 1-7.

10. An electronic device, comprising: a memory having stored therein computer program instructions which, when read and executed by the processor, perform the method of any of claims 1-7.