CN110851634B

CN110851634B - Picture retrieval method and device and electronic equipment

Info

Publication number: CN110851634B
Application number: CN201911131247.6A
Authority: CN
Inventors: 杨嘉华
Original assignee: Guangdong 3vjia Information Technology Co Ltd
Current assignee: Guangdong 3vjia Information Technology Co Ltd
Priority date: 2019-11-18
Filing date: 2019-11-18
Publication date: 2022-07-19
Anticipated expiration: 2039-11-18
Also published as: CN110851634A

Abstract

The invention provides a method and a device for retrieving pictures and electronic equipment, wherein the method comprises the following steps: taking the difference between the KL divergence of the minimized target and the mutual information of the target as a total optimization target, and training a convolutional neural network encoder model; the mutual information between the training picture and the characteristic vector obtained through the convolutional neural network encoder model is maximized, and target mutual information is obtained; minimizing KL divergence between the characteristic vector and the standard normal distribution to obtain target KL divergence; and searching pictures based on the trained convolutional neural network encoder model. The invention can reduce the cost required by picture retrieval and improve the retrieval accuracy.

Description

Picture retrieval method and device and electronic equipment

Technical Field

The present invention relates to the field of picture retrieval technologies, and in particular, to a picture retrieval method, an apparatus and an electronic device.

Background

With the development of internet information technology, the number of pictures stored on the internet has increased explosively, and in order to utilize the picture information more efficiently, picture retrieval technology has gradually become a research hotspot. The existing picture retrieval method utilizes a trained deep convolutional neural network to extract unique feature information of a picture, such as color, texture, shape, style and the like of the picture, then codes the bottom layer feature information into high-layer features, and then searches and matches the high-layer features with candidate high-layer features to retrieve similar pictures with the picture to be retrieved. However, the existing retrieval method is low in accuracy, a large number of pictures need to be marked manually, and the marked pictures are used for training the neural network, so that the cost is high.

Disclosure of Invention

In view of the above, an object of the present invention is to provide a method, an apparatus and an electronic device for picture retrieval, so as to reduce the cost required for picture retrieval and improve the retrieval accuracy.

In order to achieve the above purpose, the embodiment of the present invention adopts the following technical solutions:

in a first aspect, an embodiment of the present invention provides a method for retrieving an image, including: taking the difference between the KL divergence of the minimized target and the mutual information of the target as a total optimization target, and training a convolutional neural network encoder model; the method comprises the steps that mutual information between a training picture and a feature vector obtained through a convolutional neural network encoder model is maximized, and target mutual information is obtained; minimizing KL divergence between the characteristic vector and the standard normal distribution to obtain target KL divergence; and searching pictures based on the trained convolutional neural network encoder model.

In one embodiment, the formula for calculating the target mutual information is:

wherein I (x; z) represents mutual information between the distribution of the input training picture x and the distribution of the characteristic vector z; p (x) represents the distribution of the input training picture x; p (z) represents the distribution of the feature vector z.

In one embodiment, the target KL divergence is calculated as:

wherein q (z) represents a standard normal distribution.

In one embodiment, the overall optimization objective is calculated as:

in one embodiment, a convolutional neural network encoder model comprises: convolutional layers, batch normalization layers, and max pooling layers.

In one embodiment, the step of performing picture retrieval based on the trained convolutional neural network encoder model includes: extracting the characteristics of the picture to be retrieved and the pictures in the candidate picture library based on the trained convolutional neural network encoder model, and acquiring the characteristic vectors of the picture to be retrieved and the pictures in the candidate picture library; determining the similarity between the feature vector of the picture to be retrieved and the feature vector of each picture in the candidate picture library; and selecting a preset number of pictures in the candidate picture library with the similarity meeting the threshold value as a retrieval result.

In one embodiment, the step of determining the similarity between the feature vector of the picture to be retrieved and the feature vector of each picture in the candidate picture library includes: calculating the Euclidean distance between the feature vector of the picture to be retrieved and the feature vector of each picture in the candidate picture library; and according to the Euclidean distance, determining the similarity between the feature vector of the picture to be retrieved and the feature vector of each picture in the candidate picture library by adopting a nearest neighbor search algorithm.

In a second aspect, an embodiment of the present invention provides an apparatus for retrieving a picture, including: the model training module is used for training a convolutional neural network encoder model by taking the difference between the KL divergence of the minimized target and the mutual information of the target as a total optimization target; the mutual information between the training picture and the characteristic vector obtained through the convolutional neural network encoder model is maximized, and target mutual information is obtained; minimizing KL divergence between the characteristic vector and the standard normal distribution to obtain target KL divergence; and the picture retrieval module is used for retrieving pictures based on the trained convolutional neural network encoder model.

In a third aspect, an embodiment of the present invention provides an electronic device, which includes a processor and a memory, where the memory stores computer-executable instructions that can be executed by the processor, and the processor executes the computer-executable instructions to implement the steps of the method provided in any one of the implementation manners of the first aspect.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the steps of the method provided in any of the foregoing embodiments of the first aspect.

The embodiment of the invention provides a picture retrieval method, a device and electronic equipment, which can take the difference between the minimum target KL divergence and target mutual information as a total optimization target (the target mutual information is obtained by maximizing mutual information between a training picture and a feature vector obtained by a convolutional neural network encoder model, and the target KL divergence is obtained by minimizing KL divergence between the feature vector and a standard normal distribution), train the convolutional neural network encoder model, and perform picture retrieval based on the trained convolutional neural network encoder model. On one hand, training pictures do not need to be labeled manually before model training, and the retrieval cost can be effectively reduced; on the other hand, the total optimization target of the convolutional neural network encoder model is the difference between the target KL divergence and the target mutual information, mutual information between the training picture and the feature vector acquired through the convolutional neural network encoder model can be maximized, the convolutional neural network encoder model can be ensured to extract the feature vector which is special for the picture, important information of the picture is reserved, the KL divergence between the minimized feature vector and the standard normal distribution can restrict the extracted feature vector to be subjected to the standard normal distribution, and therefore the encoding space is more regular, the generalization is increased, and the feature extraction is facilitated. Therefore, the embodiment of the invention can effectively improve the existing picture retrieval method, and can improve the retrieval accuracy while reducing the cost of picture retrieval.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart of a method for retrieving pictures according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a convolutional neural network encoder model for training according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of extracting feature vectors of a picture using a trained convolutional neural network encoder model according to an embodiment of the present invention;

fig. 4 is a schematic flowchart of a process of performing image retrieval based on a trained convolutional neural network encoder model according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an apparatus for picture retrieval according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

At present, the existing retrieval method is low in accuracy, a large number of pictures need to be marked manually, the marked pictures are used for training a neural network, and the cost is high. Based on this, the method, the device and the electronic device for picture retrieval provided by the embodiment of the invention can reduce the cost of picture retrieval and improve the retrieval accuracy.

To facilitate understanding of this embodiment, first, a method for retrieving a picture disclosed in the embodiment of the present invention is described in detail, referring to a flowchart of a method for retrieving a picture provided in the embodiment of the present invention shown in fig. 1, where the method may be executed by an electronic device, and mainly includes the following steps S101 to S102:

step S101: and taking the difference between the minimized target KL divergence and the target mutual information as a total optimization target, and training a convolutional neural network encoder model.

The mutual information between the training picture and the characteristic vector obtained through the convolutional neural network encoder model is maximized, and target mutual information is obtained; and minimizing the KL divergence between the characteristic vector and the standard normal distribution to obtain the target KL divergence. In order to ensure that the convolutional neural network encoder model can find the feature vector corresponding to each input picture, mutual information (namely target mutual information) between the training picture and the feature vector acquired by the convolutional neural network encoder model can be maximized, and the training of the convolutional neural network encoder model is carried out by taking the target mutual information as a target; meanwhile, in order to make the coding space more regular, facilitate decoupling characteristics, enhance generalization and facilitate subsequent learning, the output result can be constrained by minimizing KL divergence (namely target KL divergence) between the characteristic vector and the standard normal distribution, so that the output characteristic vector codes obey the standard normal distribution. Therefore, the final target of the convolutional neural network encoder model training can be obtained by combining the target mutual information and the target KL divergence, namely, the difference between the target KL divergence and the target mutual information is minimized.

Step S102: and searching pictures based on the trained convolutional neural network encoder model.

During specific implementation, the picture to be retrieved can be input into the trained convolutional neural network encoder model to obtain the feature vector of the picture to be retrieved, the feature vector of the picture in the candidate picture library is obtained through the trained convolutional neural network encoder model, the similarity between the feature vector of the picture to be retrieved and the feature vector of the picture in the candidate picture library is calculated, and finally, a plurality of pictures with the similarity meeting the threshold are selected as retrieval results.

The embodiment of the invention provides a picture retrieval method, which can be used for training a convolutional neural network encoder model by taking the difference between a minimized target KL divergence and target mutual information as a total optimization target (the target mutual information is obtained by maximizing mutual information between a training picture and a feature vector obtained by the convolutional neural network encoder model, and the target KL divergence is obtained by minimizing KL divergence between the feature vector and standard normal distribution), and retrieving pictures based on the trained convolutional neural network encoder model. On one hand, training pictures do not need to be manually marked before model training, and the retrieval cost can be effectively reduced; on the other hand, the total optimization target of the convolutional neural network encoder model is the difference between the target KL divergence and the target mutual information, the mutual information between the training picture and the characteristic vector acquired through the convolutional neural network encoder model can be maximized, the convolutional neural network encoder model can be ensured to extract the characteristic vector which is special for the picture, the important information of the picture is reserved, the KL divergence between the minimized characteristic vector and the standard normal distribution can restrict the extracted characteristic vector to be subjected to the standard normal distribution, and therefore the encoding space is more regular, the generalization is increased, and the feature extraction is facilitated. Therefore, the embodiment of the invention can effectively improve the existing picture retrieval method, and can improve the retrieval accuracy while reducing the cost of picture retrieval.

In order to better extract the feature vector of the picture, the embodiment of the invention takes the difference between the KL divergence of the minimum target and the mutual information of the target as a total optimization target, trains a convolutional neural network encoder model, and has the specific calculation formula of the mutual information of the target as follows:

wherein, I (x; z) represents mutual information between the distribution of the input training picture x and the distribution of the characteristic vector z; p (x) represents the distribution of the input training picture x; p (z) represents the distribution of the feature vector z. The larger the mutual information is, the more the meaning

Should be as large as possible, i.e. p (z | x) is much larger than p (z), i.e. for each input picture x, the convolutional neural network encoder model can extract the feature vector z specific to picture x.

Further, the calculation formula of the target KL divergence is:

wherein q (z) represents a standard normal distribution.

In summary, by combining the target mutual information and the target KL divergence, a total optimization target, that is, a difference between the target KL divergence and the target mutual information is minimized, and a calculation formula of the total optimization target is:

in order to better understand the training process of the convolutional neural network encoder model, an embodiment of the present invention provides a schematic diagram of a convolutional neural network encoder model for training, and referring to fig. 2, the schematic diagram of the model training includes: firstly, inputting a picture x into a convolutional neural network encoder model, wherein the convolutional neural network encoder model comprises the following components: obtaining a convolution layer, a batch normalization layer and a maximum pooling layer to obtain an encoder output distribution p (z); next, a loss function is designed for the model to train the model (the loss function is a reflection of the model on the degree of fitting of data, the better the fitting is, the smaller the value of the loss function is), the loss function is designed with a goal of maximizing mutual information between the input picture x and the encoder output p (z), and meanwhile, the embodiment of the present invention requires that the encoder output p (z) is in accordance with the standard positive distribution, that is, the mean value is 0 and the variance is 1, and the constraint is realized by minimizing the KL divergence. The overall penalty function (i.e., overall optimization objective) is thus derived as the difference between the minimized KL divergence (i.e., target KL divergence) and the maximized mutual information (i.e., target mutual information), and finally the convolutional neural network encoder model is trained based on the overall optimization objective.

Further, an embodiment of the present invention provides a schematic diagram for extracting a picture feature vector by using a trained convolutional neural network encoder model, which is shown in fig. 3 and illustrates: and inputting the picture x into a trained convolutional neural network encoder, outputting a feature vector corresponding to the input picture by the encoder, and then retrieving the picture by using the output picture feature vector.

In a specific implementation manner, an embodiment of the present invention further provides a schematic flow chart of performing picture retrieval based on the trained convolutional neural network encoder model, which is shown in fig. 4 and mainly includes the following steps S401 to S403:

step S401: and extracting the characteristics of the picture to be retrieved and the pictures in the candidate picture library based on the trained convolutional neural network encoder model, and acquiring the characteristic vectors of the picture to be retrieved and the pictures in the candidate picture library.

Specifically, the feature vectors extracted from the trained convolutional neural network encoder model are unique information of the picture, so that the picture can be independent from the whole data set, the feature information of the original picture is substantially represented, and the mapping from the original picture to the feature vectors is completed.

Step S402: and determining the similarity between the feature vector of the picture to be retrieved and the feature vector of each picture in the candidate picture library.

Step S403: and selecting a preset number of pictures in the candidate picture library with the similarity meeting the threshold value as a retrieval result.

In order to ensure the accuracy of the retrieval result, the similarity between the picture to be retrieved and the pictures in the candidate picture library should meet a threshold, and a preset number of pictures are selected as the retrieval result from the pictures with the similarity meeting the threshold, so as to ensure that the retrieved picture is most similar to the picture to be retrieved.

For the above step S402, the following steps a1 to a2 may be performed:

step a 1: and calculating the Euclidean distance between the feature vector of the picture to be retrieved and the feature vector of each picture in the candidate picture library.

Step a 2: and according to the Euclidean distance, determining the similarity between the feature vector of the picture to be retrieved and the feature vector of each picture in the candidate picture library by adopting a nearest neighbor search algorithm.

In one embodiment, the euclidean distance based nearest neighbor search algorithm is a point P, and the query data set is a process of searching for a nearest point to the euclidean distance, that is, the smaller the euclidean distance value, the higher the similarity.

In summary, the method for picture retrieval provided by the embodiment of the present invention trains the encoder based on the difference between the minimum target KL divergence and the target mutual information, is unsupervised machine learning, can train and learn by itself without manually labeling a tag, can effectively apply a tag-free data set, and can well retain important information of an original picture by extracted picture features, thereby ensuring low cost, mobility and accuracy of picture retrieval.

As to the method for retrieving pictures provided in the foregoing embodiment, an embodiment of the present invention further provides a device for retrieving pictures, and referring to a schematic structural diagram of the device for retrieving pictures shown in fig. 5, the device may include the following components:

the model training module 501 is configured to train a convolutional neural network encoder model by taking a difference between the minimized target KL divergence and target mutual information as a total optimization target; the method comprises the steps that mutual information between a training picture and a feature vector obtained through a convolutional neural network encoder model is maximized, and target mutual information is obtained; and minimizing the KL divergence between the characteristic vector and the standard normal distribution to obtain the target KL divergence.

And an image retrieval module 502, configured to perform image retrieval based on the trained convolutional neural network encoder model.

The image retrieval device provided by the embodiment of the invention can take the difference between the minimized target KL divergence and the target mutual information as a total optimization target (the target mutual information is obtained by maximizing the mutual information between a training image and a feature vector obtained by a convolutional neural network encoder model, and the target KL divergence is obtained by minimizing the KL divergence between the feature vector and a standard normal distribution), train the convolutional neural network encoder model, and perform image retrieval based on the trained convolutional neural network encoder model. On one hand, training pictures do not need to be labeled manually before model training, and the retrieval cost can be effectively reduced; on the other hand, the total optimization target of the convolutional neural network encoder model is the difference between the target KL divergence and the target mutual information, mutual information between the training picture and the feature vector acquired through the convolutional neural network encoder model can be maximized, the convolutional neural network encoder model can be ensured to extract the feature vector which is special for the picture, important information of the picture is reserved, the KL divergence between the minimized feature vector and the standard normal distribution can restrict the extracted feature vector to be subjected to the standard normal distribution, and therefore the encoding space is more regular, the generalization is increased, and the feature extraction is facilitated. Therefore, the embodiment of the invention can effectively improve the existing picture retrieval method, and can improve the retrieval accuracy while reducing the cost of picture retrieval.

In an embodiment, the picture retrieving module 502 is further configured to: extracting the characteristics of the picture to be retrieved and the pictures in the candidate picture library based on the trained convolutional neural network encoder model, and acquiring the characteristic vectors of the picture to be retrieved and the pictures in the candidate picture library; determining the similarity between the feature vector of the picture to be retrieved and the feature vector of each picture in the candidate picture library; and selecting a preset number of pictures in the candidate picture library with the similarity meeting the threshold value as a retrieval result.

In an embodiment, the apparatus for retrieving pictures further includes a calculating module, configured to calculate the target mutual information, where a calculation formula of the target mutual information is:

in the formula, I (x; z) represents mutual information between the distribution of an input training picture x and the distribution of a characteristic vector z; p (x) represents the distribution of the input training picture x; p (z) represents the distribution of the feature vector z.

In one embodiment, the calculating module is further configured to calculate a target KL divergence, and the target KL divergence is calculated by:

wherein q (z) represents a standard normal distribution.

In an embodiment, the calculating module is further configured to calculate a total optimization objective, where the total optimization objective is calculated by the following formula:

the device provided by the embodiment of the present invention has the same implementation principle and technical effect as the method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiments without reference to the device embodiments.

The embodiment of the invention also provides electronic equipment, which specifically comprises a processor and a storage device; the storage means has stored thereon a computer program which, when executed by the processor, performs the method of any of the above described embodiments.

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, where the electronic device 100 includes: a processor 60, a memory 61, a bus 62 and a communication interface 63, wherein the processor 60, the communication interface 63 and the memory 61 are connected through the bus 62; the processor 60 is arranged to execute executable modules, such as computer programs, stored in the memory 61.

The Memory 61 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 63 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used.

The bus 62 may be an ISA bus, a PCI bus, an EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 6, but that does not indicate only one bus or one type of bus.

The memory 61 is used for storing a program, the processor 60 executes the program after receiving an execution instruction, and the method executed by the apparatus defined by the flow process disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 60, or implemented by the processor 60.

The processor 60 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 60. The Processor 60 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory 61, and the processor 60 reads the information in the memory 61 and, in combination with its hardware, performs the steps of the above method.

The computer program product of the readable storage medium provided in the embodiment of the present invention includes a computer readable storage medium storing a program code, where instructions included in the program code may be used to execute the method described in the foregoing method embodiment, and specific implementation may refer to the foregoing method embodiment, which is not described herein again.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that the following descriptions are only illustrative and not restrictive, and that the scope of the present invention is not limited to the above embodiments: those skilled in the art can still make modifications or changes to the embodiments described in the foregoing embodiments, or make equivalent substitutions for some features, within the scope of the disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for retrieving pictures, comprising:

taking the difference between the KL divergence of the minimized target and the mutual information of the target as a total optimization target, and training a convolutional neural network encoder model; maximizing mutual information between a training picture and a feature vector acquired by the convolutional neural network encoder model to obtain the target mutual information; minimizing the KL divergence between the characteristic vector and the standard normal distribution to obtain the target KL divergence;

searching pictures based on the trained convolutional neural network encoder model;

the step of retrieving pictures based on the trained convolutional neural network encoder model comprises: extracting the characteristics of the picture to be retrieved and the pictures in the candidate picture library based on the trained convolutional neural network encoder model, and acquiring the characteristic vectors of the picture to be retrieved and the pictures in the candidate picture library; determining the similarity between the feature vector of the picture to be retrieved and the feature vector of each picture in the candidate picture library; and selecting a preset number of pictures in the candidate picture library with the similarity meeting a threshold value as a retrieval result.

2. The method of claim 1, wherein the target mutual information is calculated by the following formula:

3. The method for retrieving pictures according to claim 1, wherein the calculation formula of the target KL divergence is:

wherein q (z) represents the standard normal distribution.

4. The method for retrieving pictures according to claim 1, wherein the calculation formula of the total optimization objective is:

5. the method of picture retrieval according to claim 1, wherein the convolutional neural network encoder model comprises: convolutional layers, batch normalization layers, and max pooling layers.

6. The method according to claim 1, wherein the step of determining the similarity between the feature vector of the picture to be retrieved and the feature vector of each picture in the candidate picture library comprises:

calculating the Euclidean distance between the feature vector of the picture to be retrieved and the feature vector of each picture in the candidate picture library;

and according to the Euclidean distance, determining the similarity between the feature vector of the picture to be retrieved and the feature vector of each picture in the candidate picture library by adopting a nearest neighbor search algorithm.

7. An apparatus for picture retrieval, comprising:

the model training module is used for training a convolutional neural network encoder model by taking the difference between the minimum target KL divergence and the target mutual information as a total optimization target; maximizing mutual information between a training picture and a feature vector acquired by the convolutional neural network encoder model to obtain the target mutual information; minimizing the KL divergence between the characteristic vector and the standard normal distribution to obtain the target KL divergence;

the picture retrieval module is used for retrieving pictures based on the trained convolutional neural network encoder model;

the picture retrieval module is further configured to: extracting the characteristics of the picture to be retrieved and the pictures in the candidate picture library based on the trained convolutional neural network encoder model, and acquiring the characteristic vectors of the picture to be retrieved and the pictures in the candidate picture library; determining the similarity between the feature vector of the picture to be retrieved and the feature vector of each picture in the candidate picture library; and selecting a preset number of pictures in the candidate picture library with the similarity meeting a threshold value as a retrieval result.

8. An electronic device comprising a processor and a memory, the memory storing computer-executable instructions executable by the processor, the processor executing the computer-executable instructions to perform the steps of the method of any of claims 1 to 6.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of the preceding claims 1 to 6.