CN113468365B

CN113468365B - Training method of image type recognition model, image retrieval method and device

Info

Publication number: CN113468365B
Application number: CN202111017633.XA
Authority: CN
Inventors: 田峰; 严灿祥; 陈凯
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-09-01
Filing date: 2021-09-01
Publication date: 2022-01-25
Anticipated expiration: 2041-09-01
Also published as: CN113468365A

Abstract

The disclosure relates to a training method of an image category identification model, an image retrieval method and an image retrieval device. The training method of the image category identification model comprises the following steps: acquiring a training sample set, wherein the training sample set comprises a plurality of training images; determining image categories contained in a plurality of training images; determining the distance between every two image categories based on the image categories contained in the training images, and determining every two image categories of which the distance and a preset threshold value meet a preset relation as respective difficult-case categories; acquiring an effective training image corresponding to the difficult example type from a plurality of training images based on the difficult example type; and training the image category identification model to be trained based on the effective training images and the actual categories corresponding to the effective training images to obtain the trained image category identification model.

Description

Training method of image type recognition model, image retrieval method and device

Technical Field

The present disclosure relates to the field of image processing, and in particular, to a training method for an image classification recognition model, an image retrieval method, and an image retrieval device.

Background

With the progress of science and technology and the development of computer vision technology, computer vision technology is continuously changing the life style of people. In the e-commerce field, computer vision technology is often adopted to perform feature extraction on pictures, for example, a user shoots and uploads a commodity picture, and feature extraction is performed on the commodity picture through the computer vision technology, so that the same type of commodities or similar commodities corresponding to the commodity picture can be retrieved, and the experience of searching commodities when the user shops is greatly improved. However, in the commodity retrieval, a large number of commodities with similar appearance packages often cause the commodity retrieval to have identification errors, such as perfume in beauty cosmetics pendants, which are not only similar in package, but also mostly have various transparent bottles in packaging bottles, thereby increasing the identification difficulty; and various similar clothes, shoes and boots in clothing pendants also have the problem of difficult distinction.

Aiming at the problem that similar samples are difficult to distinguish, a commonly used metric learning difficult case mining method comprises a batch-hard triple loss function (triplet loss), the method mainly comprises the steps of calculating the distance of feature vectors (embedding) of every two samples of the same batch in the training process of an image class identification model, determining the classes which are difficult cases based on the distance, and further optimizing the difficult cases. However, in the training process of the image category identification model, a sample of a batch is too small, each iteration of the triplet loss only selects a difficult example from samples with the size of batch _ size, the category which is really difficult to each other cannot be optimized from millions of categories, and the targeted optimization of similar categories in a training sample set cannot be performed.

Disclosure of Invention

The present disclosure provides a training method, an image retrieval method and an image retrieval device for an image class recognition model, so as to at least solve the problem that the related art cannot perform targeted optimization on similar classes in a training sample set.

According to a first aspect of the embodiments of the present disclosure, there is provided a training method for an image class recognition model, including: acquiring a training sample set, wherein the training sample set comprises a plurality of training images; determining image categories contained in a plurality of training images; determining the distance between every two image categories based on the image categories contained in the training images, and determining every two image categories of which the distance and a preset threshold value meet a preset relation as respective difficult-case categories; acquiring an effective training image corresponding to the difficult example type from a plurality of training images based on the difficult example type; and training the image category identification model to be trained based on the effective training images and the actual categories corresponding to the effective training images to obtain the trained image category identification model.

Optionally, determining a distance between each two image categories based on the image categories included in the plurality of training images includes: acquiring m training images of each image category in image categories contained in a plurality of training images, wherein m is a positive integer; inputting m training images into an image category identification model aiming at the m training images of each image category, and acquiring the feature vectors of the m training images output by the middle layer of the image category identification model; determining the average feature vector of the feature vectors of the m training images as the feature vector of the corresponding image category; the distance between each two image classes is determined based on the feature vector of each image class.

Optionally, based on the difficult case category, obtaining a valid training image corresponding to the difficult case category from a plurality of training images includes: aiming at each image category in the training sample set, sorting the difficult case categories of the current image category according to the distance to obtain difficult case category vectors of the current image category; taking the difficult case category vector of each image category as a row vector, and constructing a difficult case matrix of a training sample set; based on the difficult case category in the difficult case matrix, a valid training image corresponding to the difficult case category is acquired from the plurality of training images.

Optionally, based on the difficult case category of the difficult case matrix, obtaining an effective training image corresponding to the difficult case category from the training sample set includes: for each iteration training, a second preset number of difficult example types corresponding to one image type are obtained from the difficult example matrix based on one image type in the training sample set, and effective training images corresponding to the second preset number of difficult example types are obtained from the training sample set based on the second preset number of difficult example types and serve as training images required by the current iteration training.

Optionally, obtaining a second predetermined number of difficult cases corresponding to one image category from the difficult case matrix based on one image category in the training sample set includes: acquiring an image class from a training sample set as a difficult case class seed required by the iterative training; acquiring c-1 image categories corresponding to the difficult example category seeds from the difficult example matrix based on the difficult example category seeds, wherein c is a second preset number, the c-1 image categories are different from the difficult example category seeds, and c is a positive integer; and taking the difficult example type seed and the c-1 image types as a second preset number of difficult example types.

Optionally, obtaining an image class from the training sample set as a difficult-case class seed required by the iterative training includes: aiming at each row vector of the difficult-to-case matrix, acquiring the total distance of all categories of the row vectors; respectively carrying out normalization processing on the total distance of each row vector to obtain a loss value corresponding to each image category; respectively inputting the loss value corresponding to each image category into a preset nonlinear function to obtain the sampling probability of each image category; randomly extracting an image class from a predetermined set in an unreplaceable mode to serve as a difficult example class seed required by the iterative training, wherein the predetermined set comprises each image class in a training sample set, and the probability of each image class being acquired in the predetermined set is equal to the sampling probability of the corresponding image class.

Optionally, acquiring c-1 image categories corresponding to the difficult example category seeds from the difficult example matrix based on the difficult example category seeds includes: based on the difficult case category seeds, c-1 image categories are obtained in a difficult case matrix in a depth-first search or width-first search mode.

Optionally, obtaining effective training images corresponding to a second predetermined number of difficult example categories from the training sample set based on the second predetermined number of difficult example categories as training images required by the current iterative training includes: and aiming at each image category in a second preset number of difficult example categories, respectively acquiring m training images from the training sample set, and taking all the acquired training images as training images required by the iterative training.

Optionally, after each iterative training of the image class identification model is completed, the feature vector of the image class corresponding to the training image of the iterative training is updated.

Optionally, after completing one training of the image class identification model, reconstructing the hard case matrix based on the updated feature vector of each image class, where the one training includes multiple iterative training.

Optionally, after the training of the image class identification model is completed for the predetermined number of times, the hard case matrix is reconstructed based on the training sample set, wherein each training in the training of the image class identification model for the predetermined number of times includes multiple iterative training.

Optionally, after training the image class identification model to be trained based on the effective training image and the actual class corresponding to the effective training image to obtain a trained image class identification model, the method further includes: acquiring an image to be retrieved; inputting an image to be retrieved into the trained image category identification model, and acquiring a characteristic vector of the image to be retrieved output by the middle layer of the trained image category identification model; determining the distance between the image to be retrieved and each image based on the feature vector of the image to be retrieved and the feature vector of each image in the image library; and determining the image of which the distance and the preset threshold value meet the preset relation as the image of the same category as the image to be retrieved.

Optionally, after training the image class identification model to be trained based on the effective training image and the actual class corresponding to the effective training image to obtain a trained image class identification model, the method further includes: acquiring an image to be retrieved; inputting the image to be retrieved into the trained image category identification model to obtain the image category of the image to be retrieved; images of the same category as the image are obtained from the image library.

Optionally, based on image categories included in the plurality of training images, determining a distance between each two image categories, and determining each two image categories, for which the distance and a preset threshold satisfy a preset relationship, as respective difficult-to-case categories, including at least one of: determining a cosine distance between every two image categories based on the image categories contained in the training images, and determining every two image categories of which the cosine distance is greater than a preset threshold value as respective difficult-case categories; determining Euclidean distance between every two image categories based on the image categories contained in the training images, and determining every two image categories with the Euclidean distance smaller than a preset threshold value as respective difficult-case categories.

According to a second aspect of the embodiments of the present disclosure, there is provided an image retrieval method, including: acquiring an image to be retrieved; inputting an image to be retrieved into a trained image category identification model, and acquiring a feature vector of the image to be retrieved output by a middle layer of the trained image category identification model, wherein the image category identification model is obtained by training effective training images corresponding to difficult cases in a training sample set, and the difficult cases are determined by the relationship between the distance between every two image categories in the image categories contained in a plurality of training images in the training sample set and a preset threshold; determining the distance between the image to be retrieved and each image based on the feature vector of the image to be retrieved and the feature vector of each image in the image library; and determining the image of which the distance and the preset threshold value meet the preset relation as the image of the same category as the image to be retrieved.

Optionally, the distance between each two image classes in the image classes included in the training images in the training sample set is determined by: acquiring m training images of each image category in image categories contained in a plurality of training images, wherein m is a positive integer; inputting m training images into an image category identification model aiming at the m training images of each image category, and acquiring the feature vectors of the m training images output by the middle layer of the image category identification model; determining the average feature vector of the feature vectors of the m training images as the feature vector of the corresponding image category; the distance between each two image classes is determined based on the feature vector of each image class.

Optionally, valid training images corresponding to the difficult example category in the training sample set are obtained by: aiming at each image category in the training sample set, sorting the difficult case categories of the current image category according to the distance to obtain difficult case category vectors of the current image category; taking the difficult case category vector of each image category as a row vector, and constructing a difficult case matrix of a training sample set; based on the difficult case category in the difficult case matrix, a valid training image corresponding to the difficult case category is acquired from the plurality of training images.

Optionally, obtaining a second predetermined number of difficult cases corresponding to one image category from the difficult case matrix based on one image category in the training sample set includes: acquiring an image class from a training sample set as a difficult case class seed required by the iterative training; c-1 image categories corresponding to the difficult example category seeds are obtained from the difficult example matrix, wherein c is a second preset number, c-1 image categories are different from the difficult example category seeds, and c is a positive integer; and taking the difficult example type seed and the c-1 image types as a second preset number of difficult example types.

Optionally, obtaining an image class from the training sample set as a difficult-case class seed required by the iterative training includes: aiming at each row vector of the difficult-to-case matrix, acquiring the total distance of all image categories of the row vector; respectively carrying out normalization processing on the total distance of each row vector to obtain a loss value corresponding to each image category; respectively inputting the loss value corresponding to each image category into a preset nonlinear function to obtain the sampling probability of each image category; randomly extracting an image class from a predetermined set in an unreplaceable mode to serve as a difficult example class seed required by the iterative training, wherein the predetermined set comprises each image class in a training sample set, and the probability of each image class being acquired in the predetermined set is equal to the sampling probability of the corresponding image class.

Optionally, the obtaining c-1 image categories corresponding to the difficult case category seeds from the difficult case matrix includes: based on the difficult case category seeds, c-1 image categories are obtained in a difficult case matrix in a depth-first search or width-first search mode.

According to a third aspect of the embodiments of the present disclosure, there is provided a training apparatus for an image category recognition model, including: a first acquisition unit configured to acquire a training sample set, wherein the training sample set includes a plurality of training images; an image category determination unit configured to determine an image category included in the plurality of training images; a difficult case category determination unit configured to determine a distance between each two image categories based on image categories included in the plurality of training images, and determine each two image categories, for which the distance and a preset threshold satisfy a preset relationship, as respective difficult case categories; a second acquisition unit configured to acquire, from the plurality of training images, a valid training image corresponding to a difficult case category based on the difficult case category; and the training unit is configured to train the image category identification model to be trained on the basis of the effective training images and the actual categories corresponding to the effective training images to obtain the trained image category identification model.

Optionally, the difficult case category determining unit is further configured to acquire m training images of each of image categories included in the plurality of training images, where m is a positive integer; inputting m training images into an image category identification model aiming at the m training images of each image category, and acquiring the feature vectors of the m training images output by the middle layer of the image category identification model; determining the average feature vector of the feature vectors of the m training images as the feature vector of the corresponding image category; the distance between each two image classes is determined based on the feature vector of each image class.

Optionally, the second obtaining unit is further configured to, for each image category in the training sample set, sort the difficult-to-case categories of the current image category by distance to obtain a difficult-to-case category vector of the current image category; taking the difficult case category vector of each image category as a row vector, and constructing a difficult case matrix of a training sample set; based on the difficult case category in the difficult case matrix, a valid training image corresponding to the difficult case category is acquired from the plurality of training images.

Optionally, the second obtaining unit is further configured to, for each iterative training, obtain, from the difficult case matrix, a second predetermined number of difficult case categories corresponding to one image category based on one image category in the training sample set, and obtain, from the training sample set, effective training images corresponding to the second predetermined number of difficult case categories based on the second predetermined number of difficult case categories as training images required by the current iterative training.

Optionally, the second obtaining unit is further configured to obtain one image class from the training sample set as a difficult-example class seed required by the current iterative training; acquiring c-1 image categories corresponding to the difficult example category seeds from the difficult example matrix based on the difficult example category seeds, wherein c is a second preset number, the c-1 image categories are different from the difficult example category seeds, and c is a positive integer; and taking the difficult example type seed and the c-1 image types as a second preset number of difficult example types.

Optionally, the second obtaining unit is further configured to obtain, for each row vector of the difficult-to-case matrix, a total distance of all categories of the row vector; respectively carrying out normalization processing on the total distance of each row vector to obtain a loss value corresponding to each image category; respectively inputting the loss value corresponding to each image category into a preset nonlinear function to obtain the sampling probability of each image category; randomly extracting an image class from a predetermined set in an unreplaceable mode to serve as a difficult example class seed required by the iterative training, wherein the predetermined set comprises each image class in a training sample set, and the probability of each image class being acquired in the predetermined set is equal to the sampling probability of the corresponding image class.

Optionally, the second obtaining unit is further configured to obtain c-1 image categories in the difficult example matrix in a depth-first search or a width-first search based on the difficult example category seeds.

Optionally, the second obtaining unit is further configured to, for each image class in a second predetermined number of difficult example classes, respectively acquire m training images from the training sample set, and use all the acquired training images as training images required by the current iterative training.

Optionally, the apparatus further includes an updating unit configured to update the feature vector of the image category corresponding to the training image of the current iterative training after each iterative training of the image category identification model is completed.

Optionally, the updating unit is further configured to reconstruct the hard case matrix based on the updated feature vector of each image category after completing one training of the image category identification model, where the one training includes multiple iterative trainings.

Optionally, the updating unit is further configured to reconstruct the hard case matrix based on the training sample set after the training of the image class identification model for the predetermined number of times is completed, where each training of the image class identification model for the predetermined number of times includes multiple iterative training.

Optionally, the training device further comprises: a third acquisition unit configured to acquire an image to be retrieved; the fourth acquisition unit is configured to input the image to be retrieved into the trained image category identification model and acquire the feature vector of the image to be retrieved output by the middle layer of the trained image category identification model; the distance determining unit is configured to determine the distance between the image to be retrieved and each image based on the feature vector of the image to be retrieved and the feature vector of each image in the image library; and the image determining unit is configured to determine the image of which the distance and the preset threshold value meet the preset relation as the image of the same category as the image to be retrieved.

Optionally, the training device further comprises: a fifth acquiring unit configured to acquire an image to be retrieved; the sixth acquisition unit is configured to input the image to be retrieved into the trained image category identification model to obtain the image category of the image to be retrieved; a seventh acquiring unit configured to acquire an image of the same category as the image from the image library.

Optionally, the difficult category determination unit is further configured to at least one of: determining a cosine distance between every two image categories based on the image categories contained in the training images, and determining every two image categories of which the cosine distances are larger than a preset threshold value as respective difficult-case categories; determining Euclidean distance between every two image categories based on the image categories contained in the training images, and determining every two image categories with the Euclidean distance smaller than a preset threshold value as respective difficult-case categories.

According to a fourth aspect of the embodiments of the present disclosure, there is provided an image retrieval apparatus including: a first acquisition unit configured to acquire an image to be retrieved; the second acquisition unit is configured to input the image to be retrieved into a trained image category identification model, and acquire a feature vector of the image to be retrieved output by an intermediate layer of the trained image category identification model, wherein the image category identification model is obtained by training effective training images corresponding to difficult cases in a training sample set, and the difficult cases are determined by a relation between a distance between every two image categories in the image categories contained in a plurality of training images in the training sample set and a preset threshold; the distance determining unit is configured to determine the distance between the image to be retrieved and each image based on the feature vector of the image to be retrieved and the feature vector of each image in the image library; and the image determining unit is configured to determine the image of which the distance and the preset threshold value meet the preset relation as the image of the same category as the image to be retrieved.

Optionally, the image retrieval apparatus further includes a training unit configured to acquire m training images of each of image classes included in the plurality of training images, where m is a positive integer; inputting m training images into an image category identification model aiming at the m training images of each image category, and acquiring the feature vectors of the m training images output by the middle layer of the image category identification model; determining the average feature vector of the feature vectors of the m training images as the feature vector of the corresponding image category; the distance between each two image classes is determined based on the feature vector of each image class.

Optionally, the training unit is configured to rank, for each image category in the training sample set, the difficult-to-case categories of the current image category according to the distance, so as to obtain a difficult-to-case category vector of the current image category; taking the difficult case category vector of each image category as a row vector, and constructing a difficult case matrix of a training sample set; based on the difficult case category in the difficult case matrix, a valid training image corresponding to the difficult case category is acquired from the plurality of training images.

Optionally, the training unit is configured to, for each iterative training, obtain a second predetermined number of difficult example categories corresponding to one image category from the difficult example matrix based on one image category in the training sample set, and obtain, as the training images required by the current iterative training, valid training images corresponding to the second predetermined number of difficult example categories from the training sample set based on the second predetermined number of difficult example categories.

Optionally, the training unit is configured to obtain an image class from the training sample set as a difficult example class seed required by the current iterative training; c-1 image categories corresponding to the difficult example category seeds are obtained from the difficult example matrix, wherein c is a second preset number, c-1 image categories are different from the difficult example category seeds, and c is a positive integer; and taking the difficult example type seed and the c-1 image types as a second preset number of difficult example types.

Optionally, the training unit is configured to obtain, for each row vector of the difficult-to-case matrix, a total distance of all image classes of the row vector; respectively carrying out normalization processing on the total distance of each row vector to obtain a loss value corresponding to each image category; respectively inputting the loss value corresponding to each image category into a preset nonlinear function to obtain the sampling probability of each image category; randomly extracting an image class from a predetermined set in an unreplaceable mode to serve as a difficult example class seed required by the iterative training, wherein the predetermined set comprises each image class in a training sample set, and the probability of each image class being acquired in the predetermined set is equal to the sampling probability of the corresponding image class.

Optionally, the training unit is configured to obtain c-1 image categories in the difficult-case matrix in a depth-first search or a width-first search based on the difficult-case category seeds.

Optionally, the base training unit is configured to, for each image class in a second predetermined number of difficult example classes, respectively acquire m training images from the training sample set, and use all the acquired training images as training images required by the current iterative training.

Optionally, the updating unit is configured to reconstruct the hard case matrix based on the updated feature vector of each image category after completing one training of the image category identification model, where the one training includes multiple iterative trainings.

Optionally, the updating unit is configured to reconstruct the hard case matrix based on the training sample set after completing the training of the image class identification model for the predetermined number of times, where each training of the image class identification model for the predetermined number of times includes multiple iterative training.

According to a fifth aspect of embodiments of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to execute the instructions to implement a training method and/or an image retrieval method of the image class recognition model according to the present disclosure.

According to a sixth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions, when executed by at least one processor, cause the at least one processor to perform the training method and/or the image retrieval method of the image class recognition model according to the present disclosure as described above.

According to a seventh aspect of embodiments of the present disclosure, there is provided a computer program product comprising computer instructions which, when executed by a processor, implement a training method and/or an image retrieval method of an image class recognition model according to the present disclosure.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

according to the training method, the image retrieval method and the device of the image class identification model, the difficult example class is determined based on the image classes of a plurality of training images in a training sample set, then effective training images corresponding to the difficult example class are obtained from the training images according to the determined difficult example class, and the image class identification model is trained, wherein the difficult example class is obtained from all the training images in the training sample set, namely the difficult example is searched from the global angle, the negative example is added, so that the obtained difficult example is richer and more comprehensive, the accuracy of the training of the image class identification model is ensured, meanwhile, the difficult example matrix is constructed in advance before the training of the image class identification model, for example, the difficult example matrix can be calculated off-line and constructed, the difficult example relation of the class is maintained through the difficult example matrix, and other classes with the closest distance of each image class are recorded by the difficult example matrix, the construction, maintenance and training of the difficult case relation are decoupled, the difficult cases do not need to be calculated and obtained in the training process, and the training process is accelerated. Furthermore, the present disclosure, in constructing the difficult-to-case matrix, using the average feature vector (embedding) of the m training images of each image class as embedding of the corresponding image class, i.e., calculating embedding at the class granularity, can suppress the influence of noise data; furthermore, according to the method and the device, when the training images are obtained, the sampling probability is determined based on the loss of each image category, the difficult-case category seeds required by one iteration are obtained according to the sampling probability, the influence of noise can be further inhibited, and after the difficult-case category seeds are obtained, sampling is carried out in the difficult-case matrix according to depth-first search or width-first search based on the difficult-case category seeds, so that a category required by one iteration training is obtained, the categories of the training samples of the same iteration training have difficult-case relations, and the training effect of the image category identification model is guaranteed. Therefore, the present disclosure solves the problem in the related art that the similar categories in the training sample set cannot be optimized specifically.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

Fig. 1 is a schematic diagram illustrating an implementation scenario of a training method of an image class recognition model according to an exemplary embodiment of the present disclosure.

FIG. 2 is a flow diagram illustrating a method of training an image class recognition model according to an exemplary embodiment.

FIG. 3 is a graphical illustration of a non-linear function curve shown in accordance with an exemplary embodiment.

FIG. 4 is a schematic diagram illustrating an alternative image class recognition model training flow according to an exemplary embodiment.

FIG. 5 is a flow chart illustrating an image retrieval method according to an exemplary embodiment.

FIG. 6 is a block diagram illustrating an apparatus for training an image class recognition model according to an example embodiment.

Fig. 7 is a block diagram illustrating an image retrieval apparatus according to an exemplary embodiment.

Fig. 8 is a block diagram of an electronic device 800 in accordance with an embodiment of the disclosure.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The embodiments described in the following examples do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In this case, the expression "at least one of the items" in the present disclosure means a case where three types of parallel expressions "any one of the items", "a combination of any plural ones of the items", and "the entirety of the items" are included. For example, "include at least one of a and B" includes the following three cases in parallel: (1) comprises A; (2) comprises B; (3) including a and B. For another example, "at least one of the first step and the second step is performed", which means that the following three cases are juxtaposed: (1) executing the step one; (2) executing the step two; (3) and executing the step one and the step two.

At present, a commonly used metric learning hard case mining method triplet loss is mainly to perform balanced sampling on a training sample of each batch in a training process of an image class identification model, that is, each batch samples k image classes, each image class samples m samples, and it is ensured that batch _ size = m × k. Calculating the distance of the feature vectors (embedding) of every two samples in the same batch, wherein the distance between samples in the same class is required to be smaller than the distance between samples in different classes. As shown in the following formula (1), a is a certain sample, n is samples of different classes, p is samples of the same class, and the optimization goal of the loss function is that the maximum distance between samples of the same class is smaller than the minimum distance between samples of different classes, and the difference of the distance measures is larger than margin. Similar commodities are packaged in the same batch, namely the embedding distance is short, the same is difficult to implement, optimization is carried out through triple loss, the embedding distance in the class is reduced, the embedding distance between the classes is increased, and the embedding expression capacity is improved.

（1）

However, the negative examples of difficult case mining based on Triplet loss are not comprehensive, that is, one sample of the batch is too few in the training process of the image class identification model, and each iteration of Triplet loss is only to select the negative example from samples of the size of the batch _ size. In the commodity retrieval and identification, the number of samples is generally large, the triplet loss cannot optimize categories which are really difficult to exemplify from millions of categories, and the similar categories in the training sample set are not optimized in a targeted manner.

In view of the above problems, the present disclosure provides a training method for an image category identification model, which can perform targeted training on similar categories in a training sample set, and the following description takes the example of retrieving a similar commodity or a similar commodity corresponding to a commodity picture as an example.

Fig. 1 is a schematic diagram of an implementation scenario of a training method for an image class recognition model according to an exemplary embodiment of the present disclosure, as shown in fig. 1, the implementation scenario includes a server 100, a user terminal 110, and a user terminal 120, where the number of the user terminals is not limited to 2, and includes not limited to a mobile phone, a personal computer, and the like, the user terminal may install an application APP for retrieving a commodity, and the server may be one server, or several servers form a server cluster, or may be a cloud computing platform or a virtualization center.

After the server 100 receives the request for training the image category identification model sent by the

user terminal

110, 120, the server 100 may count the commodity pictures historically received from the

user terminal

110, 120, and combine the obtained commodity pictures together as a training sample set, after the server 100 obtains the training sample set, determine the image categories included in the commodity pictures in the training sample set, determine the distance between each two image categories based on the image categories included in the commodity images, determine each two image categories, of which the distance and the preset threshold satisfy the preset relationship, as respective difficult-case categories, further obtain an effective commodity picture corresponding to the difficult-case categories from the plurality of commodity pictures based on the difficult-case categories, and train the image category identification model to be trained based on the effective commodity picture and the actual category corresponding to the effective commodity picture, and obtaining the trained image category identification model. After the image category identification model is trained, the

user terminals

110 and 120 send commodity pictures to be retrieved to the server 100, the server 100 inputs the pictures to be retrieved into the trained image category identification model, and feature vectors (embedding) of the pictures to be retrieved output by the middle layer of the trained image category identification model are obtained, so that the same commodities or similar commodities which accurately correspond to the pictures to be retrieved can be obtained based on the distance between the embedding and the embedding of the commodity pictures in the server 100. In the embodiment, the difficult cases are obtained from all training samples, namely the difficult cases are searched from the global perspective, so that the obtained difficult cases are richer and more comprehensive, the accuracy of the trained image category identification model is ensured, meanwhile, the difficult case matrix is constructed in advance before the training of the image category identification model, the difficult cases do not need to be calculated and obtained in the training process, and the training process is accelerated.

Hereinafter, a training method, an image retrieval method, and an apparatus of an image class recognition model according to an exemplary embodiment of the present disclosure will be described in detail with reference to fig. 2 to 7.

Fig. 2 is a flowchart illustrating a training method of an image class recognition model according to an exemplary embodiment, where as shown in fig. 2, the training method of the image class recognition model includes the following steps:

in step S201, a training sample set is obtained, wherein the training sample set includes a plurality of training images. For example, historical commodity pictures received by the server may be counted and merged together as a training sample set.

In step S202, image classes included in the plurality of training images are determined. With particular reference to the picture of clothing, the image types may include a jacket, pants, skirt, wherein the jacket may be further subdivided into half sleeves, long sleeves, and the like, without limitation of the present disclosure.

In step S203, based on the image categories included in the plurality of training images, a distance between each two image categories is determined, and each two image categories in which the distance and a preset threshold satisfy a preset relationship are determined as respective difficult-case categories. The above-mentioned difficult category of one image category may be regarded as other categories which are relatively close to the category.

According to an exemplary embodiment of the present disclosure, determining a distance between each two image categories based on image categories included in a plurality of training images may include: acquiring m training images of each image category in image categories contained in a plurality of training images, wherein m is a positive integer; inputting m training images into an image category identification model aiming at the m training images of each image category, and acquiring the feature vectors of the m training images output by the middle layer of the image category identification model; determining the average feature vector of the feature vectors of the m training images as the feature vector of the corresponding image category; the distance between each two image classes is determined based on the feature vector of each image class. The value of m is determined according to actual needs. With the present embodiment, the average feature vector (embedding) of m samples of each image category is adopted as the embedding of the corresponding image category in the process, and the influence of noise is suppressed.

In the related technology, the triplet loss is optimized through the picture granularity, the difficult case is obtained through the embedding of each picture, at this time, if noise exists, the selection of the difficult case is influenced, for example, if noise data is included in a picture category, the positive case embedding distance of the picture category is very large, and pictures of other categories are mixed in the same category, and the negative case embedding distance of the picture category is very small, which influences the selection of the difficult case type, so that the training result of the image category identification model is unstable. The training samples for commodity retrieval are often huge in quantity, and difficult mining caused by the inclusion of a large amount of noise is inevitable, the average feature vector (embedding) of m samples of each image category is adopted as the embedding of the corresponding image category, even if the same category is mixed with pictures of other categories, the average feature vector is not greatly influenced, and the influence of noise is effectively inhibited.

According to an exemplary embodiment of the disclosure, determining a distance between every two image categories based on image categories included in a plurality of training images, and determining every two image categories, of which the distance and a preset threshold satisfy a preset relationship, as respective difficult-to-case categories may include at least one of: determining a cosine distance between every two image categories based on the image categories contained in the training images, and determining every two image categories of which the cosine distance is greater than a preset threshold value as respective difficult-case categories; determining Euclidean distance between every two image categories based on the image categories contained in the training images, and determining every two image categories with the Euclidean distance smaller than a preset threshold value as respective difficult-case categories. By the embodiment, the difficult case category can be determined more conveniently by adopting the cosine distance or the Euclidean distance. It should be noted that, the distances in the above embodiments include, but are not limited to: cosine distance, euclidean distance.

Returning to fig. 2, in step S204, a valid training image corresponding to the difficult example category is acquired from the plurality of training images based on the difficult example category. The step can be performed before each iterative training, that is, before each iterative training, the training image required by the iterative training is obtained, or can be performed before the training of the image type identification model, that is, before the training of the image type identification model, the training image required by each iterative training is determined. In general, training of the image class recognition model may be performed multiple times based on all training images of the same training sample set, and each training may be referred to as an epoch (epoch). In a training period, multiple iterative training may typically be performed, each iterative training being trained using a batch of training images. That is, each iteration of training may obtain a batch (batch) of training images from a set of training samples used to train the image class identification model.

According to an exemplary embodiment of the present disclosure, acquiring, from a plurality of training images, a valid training image corresponding to a difficult case category based on the difficult case category may include: aiming at each image category in the training sample set, sorting the difficult case categories of the current image category according to the distance to obtain difficult case category vectors of the current image category; taking the difficult case category vector of each image category as a row vector, and constructing a difficult case matrix of a training sample set; based on the difficult case category in the difficult case matrix, a valid training image corresponding to the difficult case category is acquired from the plurality of training images. According to the method, the difficult case matrix is constructed by the cosine before training, so that the difficult case matrix can be calculated and constructed in an off-line mode, the difficult case relation of the categories is maintained through the difficult case matrix, other categories with the closest distance of each image category are recorded by the difficult case matrix, the difficult case relation is constructed, maintained and trained and decoupled, the difficult cases do not need to be calculated and obtained in the training process, and the training process is accelerated.

For example, the construction of the difficult case matrix is described in detail by taking the distance as the cosine distance and the sample as the picture. The method comprises the steps of knowing that C image categories exist in a training sample set, sampling m pictures from the training sample set by each image category, inputting the m sampled pictures into an image category identification model respectively to obtain a feature vector (embedding) of each picture, and calculating an average vector of the feature vectors of the m pictures as embedding of the corresponding image category, namely the average feature vector. Based on embedding for each image category, a feature matrix F (C × D) may be obtained, where D is the dimension of embedding. Calculating cosine distances of every two image categories based on the feature matrix F, sorting the cosine distances from big to small for each image category, selecting the cosine distances of the first preset number after sorting, and taking the image category corresponding to the selected cosine distances as the difficult-to-case category of the category, thereby obtaining a difficult-to-case matrix T (C multiplied by K), wherein K is the maximum number of the difficult-to-case categories.

According to an exemplary embodiment of the present disclosure, acquiring valid training images corresponding to a difficult case category from a training sample set based on the difficult case category of a difficult case matrix may include: for each iteration training, a second preset number of difficult example types corresponding to one image type are obtained from the difficult example matrix based on one image type in the training sample set, and effective training images corresponding to the second preset number of difficult example types are obtained from the training sample set based on the second preset number of difficult example types and serve as training images required by the current iteration training. According to the embodiment, the difficult case type required by one-time iterative training is obtained from the difficult case matrix, and then the training image required by one-time iterative training is obtained, so that the difficult case type in the obtained training image is more accurate.

According to an exemplary embodiment of the present disclosure, obtaining a second predetermined number of difficult cases corresponding to one image class from the difficult case matrix based on the one image class in the training sample set may include: acquiring an image class from a training sample set as a difficult case class seed required by the iterative training; acquiring c-1 image categories corresponding to the difficult example category seeds from the difficult example matrix based on the difficult example category seeds, wherein c is a second preset number, the c-1 image categories are different from the difficult example category seeds, and c is a positive integer; and taking the difficult example type seed and the c-1 image types as a second preset number of difficult example types. Through the embodiment, the difficult case categories required by one-time iterative training can be conveniently and quickly acquired.

According to an exemplary embodiment of the present disclosure, acquiring one image category from a training sample set as a difficult-to-case category seed required by the current iterative training may include: aiming at each row vector of the difficult-to-case matrix, acquiring the total distance of all categories of the row vectors; respectively carrying out normalization processing on the total distance of each row vector to obtain a loss value corresponding to each image category; respectively inputting the loss value corresponding to each image category into a preset nonlinear function to obtain the sampling probability of each image category; randomly extracting an image class from a predetermined set in an unreplaceable mode to serve as a difficult example class seed required by the iterative training, wherein the predetermined set comprises each image class in a training sample set, and the probability of each image class being acquired in the predetermined set is equal to the sampling probability of the corresponding image class. By the embodiment, the sampling probability is determined based on the loss of each image category, and the difficult-case category seeds are obtained according to the sampling probability, so that the influence of noise can be further suppressed.

It should be noted that, the total distance of all the categories of the above-mentioned acquisition row vector can be implemented as follows: and acquiring the distance between every two image categories in the row vector, and summing the acquired distances to obtain the total distance. The non-linear function may be any function suitable for the present disclosure, for example, a function having a graph as shown in fig. 3 may be selected, a random sampler may be set based on the function, and the loss value of each image class may be input into the random sampler to obtain the sampling probability of the corresponding image class. As shown in fig. 3, the class having a loss value of about 0.7 is sampled with the highest probability as a seed, that is, the class of semi-hard is heavily sampled, and since the loss value is too large, the probability of representing noise is high, and therefore, the sampling probability of the class having a loss value of more than 0.7 is reduced, and the interference of noise data can be further suppressed.

According to an exemplary embodiment of the present disclosure, acquiring c-1 image categories corresponding to the difficult-case category seeds from the difficult-case matrix based on the difficult-case category seeds may include: based on the difficult case category seeds, c-1 image categories are obtained in a difficult case matrix in a depth-first search or width-first search mode. Through the embodiment, sampling is carried out according to depth-first search or width-first search, so that the classes in the same batch have difficult relation, and the training effect of the image class recognition model is ensured.

According to an exemplary embodiment of the present disclosure, acquiring, from the training sample set, effective training images corresponding to a second predetermined number of difficult example categories based on the second predetermined number of difficult example categories, as training images required by the current iterative training may include: and aiming at each image category in a second preset number of difficult example categories, respectively acquiring m training images from the training sample set, and taking all the acquired training images as training images required by the iterative training. By the embodiment, the number of the acquired training samples is ensured to meet the number of samples required by the iterative training.

In step S205, the image class identification model to be trained is trained based on the effective training images and the actual classes corresponding to the effective training images, so as to obtain a trained image class identification model. Specifically, the training process may include inputting the training image into an image category identification model to obtain an estimated category of the training image, determining a target loss function based on the estimated category and an actual category of the training image, adjusting parameters of the image category identification model through target loss minimization, and training the image category identification model.

According to an exemplary embodiment of the present disclosure, after training an image class recognition model to be trained based on an effective training image and an actual class corresponding to the effective training image to obtain a trained image class recognition model, the method further includes: acquiring an image to be retrieved; inputting an image to be retrieved into the trained image category identification model, and acquiring a characteristic vector of the image to be retrieved output by the middle layer of the trained image category identification model; determining the distance between the image to be retrieved and each image based on the feature vector of the image to be retrieved and the feature vector of each image in the image library; and determining the image of which the distance and the preset threshold value meet the preset relation as the image of the same category as the image to be retrieved. For example, after the image category identification model is trained, the image category identification model can be used to obtain an accurate feature vector of the picture to be retrieved, and then based on the distances between the accurate feature vector and all commodity pictures in the server, the same commodity or similar commodity which accurately corresponds to the picture to be retrieved is obtained.

According to an exemplary embodiment of the present disclosure, after training an image class recognition model to be trained based on an effective training image and an actual class corresponding to the effective training image to obtain a trained image class recognition model, the method further includes: acquiring an image to be retrieved; inputting the image to be retrieved into the trained image category identification model to obtain the image category of the image to be retrieved; images of the same category as the image are obtained from the image library.

According to the exemplary embodiment of the disclosure, after each iterative training of the image category identification model is completed, the feature vector of the image category corresponding to the training image of the iterative training can be updated. By the embodiment, the feature vectors of the image categories corresponding to the samples used in the iterative training are updated in real time, so that the difficult-to-case matrix is updated subsequently. For example, when one iteration training is finished, the number of classes of samples used in the iteration training is known to be c, each image class corresponds to m samples, an average vector of the m samples is calculated to serve as an average feature vector of the corresponding image class, and the average vector is updated into the feature matrix F in real time, so that the difficult-to-sample matrix T is updated subsequently.

According to an exemplary embodiment of the disclosure, after one training of the image class identification model is completed, the hard case matrix can be reconstructed based on the updated feature vector of each image class, wherein the one training comprises a plurality of iterative training. Through the embodiment, the difficult-case matrix is reconstructed after each training, so that the difficult-case matrix can be maintained better, and the training of the image type recognition model is facilitated. For example, after the image type recognition model completes one training, the difficult-to-case matrix T is obtained again based on the updated matrix F.

According to an exemplary embodiment of the disclosure, after a predetermined number of times of training of the image category identification model is completed, the hard case matrix may be reconstructed based on the training sample set, wherein each training of the predetermined number of times of training of the image category identification model includes a plurality of iterative training. The predetermined number of times can be set according to actual conditions. By the embodiment, the problem that the untrained classes still exist after multiple training possibly because the samples adopted in each training are randomly sampled is solved, and the corresponding feature vectors can be updated by the untrained classes, so that a difficult matrix can be maintained better. For example, after the training of the image class recognition model is completed for a predetermined number of times, the difficult-case matrix T is reconstructed based on the training samples for better maintenance of the difficult-case matrix T.

The method is used for mining the difficult example samples in the commodities and solving the problem that the retrieval and identification of the commodities with similar appearances in the traditional metric learning are wrong. For convenience of understanding, the following takes the distance as the cosine distance and the sample as the picture of the commodity as an example, and the detailed description is given with reference to fig. 4, as shown in fig. 4, the specific process is as follows:

1) preparation is made for constructing a difficult case tree (i.e., the above-mentioned difficult case matrix): it is known that a training sample set has C image categories, m pictures are sampled for each image category, an embedding of each picture is obtained by inputting C × m pictures into a network (i.e., the image category identification model), an average embedding of the m pictures of each image category is calculated as the embedding of the category, and a feature matrix F (C × D) is obtained, where D is the dimensionality of the embedding.

2) Constructing a difficult case tree: the imbedding cosine distance of every two image categories in the C image categories is calculated, the difficult case category of each image category is obtained by sorting the cosine distances, and then the difficult case tree T (C × K) is obtained.

3) And (3) loss calculation: calculating the total distance of each image category according to the difficulty case tree obtained in the step 2), and normalizing the total distance to be between (0, 1) as the imbedding loss value of the category, thereby obtaining a loss matrix (C multiplied by 1) of each image category. The larger the sum of the imbedding cosine distances is, the closer the category is to the distances of other categories, and the more difficult the category is to be identified and distinguished.

4) Sampling of difficult category seeds: inputting the loss value calculated in the step 3) into a nonlinear function corresponding to the step 3) to obtain the sampling probability of the difficult case type seeds corresponding to the image types.

5) Sampling of training samples for each batch: sampling one difficult case type seed for each batch according to the sampling probability in 4). Then according to the difficult case type seeds, c-1 other types are sampled in the difficult case matrix according to depth-first search or width-first search to obtain c image types, each image type samples m pictures to obtain a training sample of a batch, wherein batch _ size = c × m, and the number N of the batches of each epoch can be set according to the training actual situation.

6) And (3) real-time updating of characteristics: the training samples of each batch have c image classes, and each image class has m samples. The training samples of each batch in training can be calculated to obtain c × m embedding, the average embedding of each image category in the c image categories is calculated in real time, and the feature matrix F is updated in real time.

7) One epoch update: and according to the number N of the batchs set by the epoch, after the network is executed for N batchs, at the moment, one epoch is ended, and the representative feature matrix F is updated. Returning to the flow 2), reconstructing the difficult and difficult example tree T according to the feature matrix F after real-time update, and executing the flows 3), 4), 5) and 6) to execute new epochs on the network.

8) Updating the full data: since the training samples of each epoch are randomly sampled, there will always be some classes that are not trained, and therefore, all of the feature matrix F needs to be updated periodically. At this time, the parameter E may be set as a predetermined number of times, after each E epochs are trained, the feature matrix F is updated based on the total amount of the training sample set, and the procedure 1) is returned to, and all categories of the training sample set are sampled to train one epoch, and the procedures 2), 3), 4), 5), 6), 7) are continued to execute new difficult-case mining epochs.

The above-described embodiments of the present disclosure primarily address construction of a difficult case tree and sampling of each batch training sample. The difficulty cases are obtained from all training samples, namely the difficulty cases are searched from the global perspective, so that the obtained difficulty cases are richer and more comprehensive, the accuracy of the trained image category identification model is ensured, meanwhile, a difficulty case matrix is constructed in advance before the training of the image category identification model, the difficulty cases do not need to be calculated and obtained in the training process, and the training process is accelerated; and moreover, a difficult case tree is constructed by calculating the average embedding of each image category, a training sample of each batch is constructed by directly sampling the difficult case tree in the training process, meanwhile, the embedding of each image category is updated in real time in the training process, and the difficult case tree is periodically updated and maintained. In addition, the corresponding sampling probability is determined based on the loss of the category, so that the difficult category is emphasized to select the semi-hard category, and the influence of noise data is further inhibited.

Fig. 5 is a flowchart illustrating an image retrieval method according to an exemplary embodiment, as shown in fig. 5, the image retrieval method including the steps of:

in step S501, an image to be retrieved is acquired.

In step S502, an image to be retrieved is input to the trained image class identification model, and a feature vector of the image to be retrieved output by the middle layer of the trained image class identification model is obtained, where the image class identification model is obtained by training effective training images corresponding to difficult cases in a training sample set, and the difficult cases are determined by a relationship between a distance between each two image classes in the image classes included in a plurality of training images in the training sample set and a preset threshold.

According to an exemplary embodiment of the present disclosure, a distance between each two image classes in the image classes included in the plurality of training images in the training sample set is determined by: acquiring m training images of each image category in image categories contained in a plurality of training images, wherein m is a positive integer; inputting m training images into an image category identification model aiming at the m training images of each image category, and acquiring the feature vectors of the m training images output by the middle layer of the image category identification model; determining the average feature vector of the feature vectors of the m training images as the feature vector of the corresponding image category; the distance between each two image classes is determined based on the feature vector of each image class.

According to an exemplary embodiment of the present disclosure, valid training images corresponding to difficult cases in a training sample set are obtained by: aiming at each image category in the training sample set, sorting the difficult case categories of the current image category according to the distance to obtain difficult case category vectors of the current image category; taking the difficult case category vector of each image category as a row vector, and constructing a difficult case matrix of a training sample set; based on the difficult case category in the difficult case matrix, a valid training image corresponding to the difficult case category is acquired from the plurality of training images.

According to an exemplary embodiment of the present disclosure, acquiring valid training images corresponding to a difficult case category from a training sample set based on the difficult case category of a difficult case matrix includes: for each iteration training, a second preset number of difficult example types corresponding to one image type are obtained from the difficult example matrix based on one image type in the training sample set, and effective training images corresponding to the second preset number of difficult example types are obtained from the training sample set based on the second preset number of difficult example types and serve as training images required by the current iteration training.

According to an exemplary embodiment of the present disclosure, obtaining a second predetermined number of difficult cases corresponding to one image class from a difficult case matrix based on the one image class in a training sample set includes: acquiring an image class from a training sample set as a difficult case class seed required by the iterative training; c-1 image categories corresponding to the difficult example category seeds are obtained from the difficult example matrix, wherein c is a second preset number, c-1 image categories are different from the difficult example category seeds, and c is a positive integer; and taking the difficult example type seed and the c-1 image types as a second preset number of difficult example types.

According to an exemplary embodiment of the present disclosure, acquiring an image category from a training sample set as a difficult-to-case category seed required by the iterative training includes: aiming at each row vector of the difficult-to-case matrix, acquiring the total distance of all image categories of the row vector; respectively carrying out normalization processing on the total distance of each row vector to obtain a loss value corresponding to each image category; respectively inputting the loss value corresponding to each image category into a preset nonlinear function to obtain the sampling probability of each image category; randomly extracting an image class from a predetermined set in an unreplaceable mode to serve as a difficult example class seed required by the iterative training, wherein the predetermined set comprises each image class in a training sample set, and the probability of each image class being acquired in the predetermined set is equal to the sampling probability of the corresponding image class.

According to an exemplary embodiment of the present disclosure, acquiring c-1 image classes corresponding to the difficult-case class seeds from the difficult-case matrix includes: based on the difficult case category seeds, c-1 image categories are obtained in a difficult case matrix in a depth-first search or width-first search mode.

According to an exemplary embodiment of the present disclosure, acquiring, from a training sample set, effective training images corresponding to a second predetermined number of difficult example categories based on the second predetermined number of difficult example categories, as training images required by this iterative training includes: and aiming at each image category in a second preset number of difficult example categories, respectively acquiring m training images from the training sample set, and taking all the acquired training images as training images required by the iterative training.

According to the exemplary embodiment of the disclosure, after each iterative training of the image category identification model is completed, the feature vector of the image category corresponding to the training image of the iterative training is updated.

According to the exemplary embodiment of the disclosure, after one training of the image category identification model is completed, the hard case matrix is reconstructed based on the updated feature vector of each image category, wherein the one training comprises a plurality of times of iterative training.

According to the exemplary embodiment of the disclosure, after the training of the image category identification model for the predetermined times is completed, the difficult-case matrix is reconstructed based on the training sample set, wherein each training in the training of the image category identification model for the predetermined times comprises a plurality of iterative training.

In step S503, the distance between the image to be retrieved and each image is determined based on the feature vector of the image to be retrieved and the feature vector of each image in the image library.

In step S504, the image whose distance and the preset threshold satisfy the preset relationship is determined as the image of the same category as the image to be retrieved.

FIG. 6 is a block diagram illustrating an apparatus for training an image class recognition model according to an example embodiment. Referring to fig. 6, the apparatus includes a first acquisition unit 60, an image category determination unit 62, a difficult case category determination unit 64, a second acquisition unit 66, and a training unit 68.

A first obtaining unit 60 configured to obtain a training sample set, wherein the training sample set includes a plurality of training images; an image class determination unit 62 configured to determine an image class included in the plurality of training images; a difficult case category determination unit 64 configured to determine a distance between each two image categories based on image categories included in the plurality of training images, and determine each two image categories, for which the distance and a preset threshold satisfy a preset relationship, as respective difficult case categories; a second acquisition unit 66 configured to acquire, from the plurality of training images, a valid training image corresponding to a difficult case category based on the difficult case category; and the training unit 68 is configured to train the image class identification model to be trained based on the valid training images and the actual classes corresponding to the valid training images, so as to obtain a trained image class identification model.

According to an exemplary embodiment of the present disclosure, the difficult case category determining unit 64 is further configured to acquire m training images of each of the image categories included in the plurality of training images, where m is a positive integer; inputting m training images into an image category identification model aiming at the m training images of each image category, and acquiring the feature vectors of the m training images output by the middle layer of the image category identification model; determining the average feature vector of the feature vectors of the m training images as the feature vector of the corresponding image category; the distance between each two image classes is determined based on the feature vector of each image class.

According to an exemplary embodiment of the present disclosure, the second obtaining unit 66 is further configured to, for each image category in the training sample set, sort the difficult-case categories of the current image category by distance, to obtain a difficult-case category vector of the current image category; taking the difficult case category vector of each image category as a row vector, and constructing a difficult case matrix of a training sample set; based on the difficult case category in the difficult case matrix, a valid training image corresponding to the difficult case category is acquired from the plurality of training images.

According to an exemplary embodiment of the present disclosure, the second obtaining unit 66 is further configured to, for each iterative training, obtain a second predetermined number of difficult case categories corresponding to one image category from the difficult case matrix based on one image category in the training sample set, and obtain, as training images required for this iterative training, valid training images corresponding to the second predetermined number of difficult case categories from the training sample set based on the second predetermined number of difficult case categories.

According to an exemplary embodiment of the present disclosure, the second obtaining unit 66 is further configured to obtain one image category from the training sample set as a difficult-case category seed required by the current iterative training; acquiring c-1 image categories corresponding to the difficult example category seeds from the difficult example matrix based on the difficult example category seeds, wherein c is a second preset number, the c-1 image categories are different from the difficult example category seeds, and c is a positive integer; and taking the difficult example type seed and the c-1 image types as a second preset number of difficult example types.

According to an exemplary embodiment of the present disclosure, the second obtaining unit 66 is further configured to obtain, for each row vector of the refractory matrix, a total distance of all classes of the row vector; respectively carrying out normalization processing on the total distance of each row vector to obtain a loss value corresponding to each image category; respectively inputting the loss value corresponding to each image category into a preset nonlinear function to obtain the sampling probability of each image category; randomly extracting an image class from a predetermined set in an unreplaceable mode to serve as a difficult example class seed required by the iterative training, wherein the predetermined set comprises each image class in a training sample set, and the probability of each image class being acquired in the predetermined set is equal to the sampling probability of the corresponding image class.

According to an exemplary embodiment of the present disclosure, the second obtaining unit 66 is further configured to obtain c-1 image categories in the difficult-case matrix in a depth-first search or a width-first search based on the difficult-case category seed.

According to an exemplary embodiment of the present disclosure, the second obtaining unit 66 is further configured to, for each image class in a second predetermined number of difficult example classes, respectively collect m training images from the training sample set, and use all the collected training images as training images required by the current iterative training.

According to an exemplary embodiment of the present disclosure, the apparatus further includes an updating unit 610 configured to update the feature vector of the image category corresponding to the training image of the current iterative training after each iterative training of the image category identification model is completed.

According to an exemplary embodiment of the disclosure, the updating unit 610 is further configured to reconstruct the hard case matrix based on the updated feature vector of each image category after completing one training of the image category identification model, where the one training includes multiple iterative training.

According to an exemplary embodiment of the disclosure, the updating unit 610 is further configured to reconstruct the hard case matrix based on the training sample set after completing the training of the image class recognition model for the predetermined number of times, wherein each training of the image class recognition model for the predetermined number of times includes a plurality of iterative training.

According to an exemplary embodiment of the disclosure, the training apparatus further comprises: a fifth acquiring unit configured to acquire an image to be retrieved; the sixth acquisition unit is configured to input the image to be retrieved into the trained image category identification model to obtain the image category of the image to be retrieved; a seventh acquiring unit configured to acquire an image of the same category as the image from the image library.

According to an exemplary embodiment of the disclosure, the training apparatus further comprises: a third obtaining unit 612 configured to obtain an image to be retrieved; a fourth obtaining unit 614, configured to input the image to be retrieved to the trained image category identification model, and obtain a feature vector of the image to be retrieved output by the middle layer of the trained image category identification model; a distance determining unit 616 configured to determine a distance between the image to be retrieved and each image based on the feature vector of the image to be retrieved and the feature vector of each image in the image library; the image determining unit 618 is configured to determine the image whose distance and the preset threshold satisfy the preset relationship as the image of the same category as the image to be retrieved.

According to an exemplary embodiment of the present disclosure, the difficult category determination unit 64 is further configured to at least one of: determining a cosine distance between every two image categories based on the image categories contained in the training images, and determining every two image categories of which the cosine distances are larger than a preset threshold value as respective difficult-case categories; determining Euclidean distance between every two image categories based on the image categories contained in the training images, and determining every two image categories with the Euclidean distance smaller than a preset threshold value as respective difficult-case categories.

Fig. 7 is a block diagram illustrating an image retrieval apparatus according to an exemplary embodiment. Referring to fig. 7, the apparatus includes a first acquisition unit 70, a second acquisition unit 72, a distance determination unit 74, and an image determination unit 76.

A first acquisition unit 70 configured to acquire an image to be retrieved; a second obtaining unit 72 configured to input the image to be retrieved to a trained image category identification model, and obtain a feature vector of the image to be retrieved output by an intermediate layer of the trained image category identification model, where the image category identification model is obtained by training effective training images corresponding to difficult cases in a training sample set, and the difficult cases are determined by a relationship between a distance between each two image categories in the image categories included in a plurality of training images in the training sample set and a preset threshold; a distance determining unit 74 configured to determine a distance between the image to be retrieved and each image based on the feature vector of the image to be retrieved and the feature vector of each image in the image library; and an image determining unit 76 configured to determine the image of which the distance satisfies a preset relationship with a preset threshold as the image of the same category as the image to be retrieved.

According to an embodiment of the present disclosure, the image retrieval apparatus further includes a training unit configured to acquire m training images of each of image classes included in the plurality of training images, where m is a positive integer; inputting m training images into an image category identification model aiming at the m training images of each image category, and acquiring the feature vectors of the m training images output by the middle layer of the image category identification model; determining the average feature vector of the feature vectors of the m training images as the feature vector of the corresponding image category; the distance between each two image classes is determined based on the feature vector of each image class.

According to an embodiment of the disclosure, a training unit is configured to rank, for each image category in a training sample set, hard case categories of a current image category by distance to obtain hard case category vectors of the current image category; taking the difficult case category vector of each image category as a row vector, and constructing a difficult case matrix of a training sample set; based on the difficult case category in the difficult case matrix, a valid training image corresponding to the difficult case category is acquired from the plurality of training images.

According to an embodiment of the present disclosure, for each iteration training, the training unit is configured to acquire, from the difficult case matrix, a second predetermined number of difficult case categories corresponding to one image category based on one image category in the training sample set, and acquire, from the training sample set, effective training images corresponding to the second predetermined number of difficult case categories based on the second predetermined number of difficult case categories, as training images required by the current iteration training.

According to the embodiment of the disclosure, a training unit is configured to acquire one image category from a training sample set as a difficult case category seed required by the iterative training; c-1 image categories corresponding to the difficult example category seeds are obtained from the difficult example matrix, wherein c is a second preset number, c-1 image categories are different from the difficult example category seeds, and c is a positive integer; and taking the difficult example type seed and the c-1 image types as a second preset number of difficult example types.

According to an embodiment of the present disclosure, a training unit configured to obtain, for each row vector of a difficult-to-case matrix, a total distance of all image classes of the row vector; respectively carrying out normalization processing on the total distance of each row vector to obtain a loss value corresponding to each image category; respectively inputting the loss value corresponding to each image category into a preset nonlinear function to obtain the sampling probability of each image category; randomly extracting an image class from a predetermined set in an unreplaceable mode to serve as a difficult example class seed required by the iterative training, wherein the predetermined set comprises each image class in a training sample set, and the probability of each image class being acquired in the predetermined set is equal to the sampling probability of the corresponding image class.

According to an embodiment of the disclosure, a training unit is configured to acquire c-1 image classes in a difficult example matrix in a depth-first search or width-first search mode based on difficult example class seeds.

According to the embodiment of the disclosure, the training unit is configured to collect m training images from the training sample set respectively for each image category in a second predetermined number of difficult example categories, and use all the collected training images as the training images required by the current iterative training.

According to the embodiment of the disclosure, the device further comprises an updating unit configured to update the feature vector of the image category corresponding to the training image of the current iterative training after each iterative training of the image category identification model is completed.

According to the embodiment of the disclosure, the updating unit is configured to reconstruct the hard case matrix based on the updated feature vector of each image category after completing one training of the image category identification model, wherein the one training comprises a plurality of iterative training.

According to an embodiment of the disclosure, the updating unit is configured to reconstruct the hard case matrix based on the training sample set after training of the image class identification model for a predetermined number of times is completed, wherein each training of the image class identification model for the predetermined number of times comprises a plurality of iterative training.

According to an embodiment of the present disclosure, an electronic device may be provided. Fig. 8 is a block diagram of an electronic device 800 including at least one memory 801 and at least one processor 802 having a set of computer-executable instructions stored therein that, when executed by the at least one processor, perform a method of training an image class recognition model according to an embodiment of the disclosure, according to an embodiment of the disclosure.

By way of example, the electronic device 800 may be a PC computer, tablet device, personal digital assistant, smart phone, or other device capable of executing the set of instructions described above. The electronic device 1000 need not be a single electronic device, but can be any collection of devices or circuits that can execute the above instructions (or sets of instructions) individually or in combination. The electronic device 800 may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).

In the electronic device 800, the processor 802 may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, the processor 802 may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.

The processor 802 may execute instructions or code stored in memory, wherein the memory 801 may also store data. The instructions and data may also be transmitted or received over a network via a network interface device, which may employ any known transmission protocol.

The memory 801 may be integrated with the processor 802, for example, RAm or flash memory disposed within an integrated circuit microprocessor or the like. Further, memory 802 may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The memory 801 and the processor 802 may be operatively coupled or may communicate with each other, such as through I/O ports, network connections, etc., so that the processor 802 can read files stored in the memory 801.

Further, the electronic device 800 may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device may be connected to each other via a bus and/or a network.

According to an embodiment of the present disclosure, there may also be provided a computer-readable storage medium, wherein when executed by at least one processor, instructions in the computer-readable storage medium cause the at least one processor to perform the training method of the image class recognition model of the embodiment of the present disclosure. Examples of the computer-readable storage medium herein include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD + RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD + RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or compact disc memory, Hard Disk Drive (HDD), solid-state drive (SSD), card-type memory (such as a multimedia card, a Secure Digital (SD) card or a extreme digital (XD) card), magnetic tape, a floppy disk, a magneto-optical data storage device, an optical data storage device, a hard disk, a magnetic tape, a magneto-optical data storage device, a hard disk, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, A solid state disk, and any other device configured to store and provide a computer program and any associated data, data files, and data structures to a processor or computer in a non-transitory manner such that the processor or computer can execute the computer program. The computer program in the computer-readable storage medium described above can be run in an environment deployed in a computer apparatus, such as a client, a host, a proxy device, a server, and the like, and further, in one example, the computer program and any associated data, data files, and data structures are distributed across a networked computer system such that the computer program and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.

According to an embodiment of the present disclosure, there is provided a computer program product including computer instructions, which when executed by a processor, implement a training method of an image class recognition model of an embodiment of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A training method of an image category identification model is characterized by comprising the following steps:

acquiring a training sample set, wherein the training sample set comprises a plurality of training images;

determining image categories contained in the plurality of training images;

determining the distance between every two image categories based on the image categories contained in the training images, and determining every two image categories of which the distance and a preset threshold value meet a preset relation as respective difficult-case categories;

acquiring a valid training image corresponding to the difficult case category from the plurality of training images based on the difficult case category;

training an image category identification model to be trained based on the effective training images and actual categories corresponding to the effective training images to obtain a trained image category identification model;

wherein obtaining, from the plurality of training images, an effective training image corresponding to the difficult case category based on the difficult case category comprises:

for each image category in the training sample set, sorting the difficult case categories of the current image category according to distance to obtain difficult case category vectors of the current image category;

taking the difficult case category vector of each image category as a row vector, and constructing a difficult case matrix of the training sample set;

and acquiring a valid training image corresponding to the difficult case category from the plurality of training images based on the difficult case category in the difficult case matrix.

2. The training method according to claim 1, wherein the determining the distance between each two image classes based on the image classes included in the plurality of training images comprises:

acquiring m training images of each image category in the image categories contained in the training images, wherein m is a positive integer;

inputting the m training images into an image category identification model aiming at the m training images of each image category, and acquiring the feature vectors of the m training images output by the middle layer of the image category identification model; determining the average feature vector of the feature vectors of the m training images as the feature vector of the corresponding image category;

determining a distance between each two image categories based on the feature vector of each image category.

3. The training method of claim 1, wherein the obtaining of the valid training images from the set of training samples corresponding to the difficult case category based on the difficult case category of the difficult case matrix comprises:

and for each iteration training, acquiring a second preset number of difficult example types corresponding to one image type from the difficult example matrix based on the one image type in the training sample set, and acquiring effective training images corresponding to the second preset number of difficult example types from the training sample set based on the second preset number of difficult example types to serve as training images required by the current iteration training.

4. The training method of claim 3, wherein the obtaining a second predetermined number of difficult cases from the difficult case matrix based on one image class in the training sample set, the second predetermined number of difficult cases corresponding to the one image class, comprises:

acquiring an image class from the training sample set as a difficult case class seed required by the iterative training;

c-1 image categories corresponding to the difficult example category seeds are obtained from the difficult example matrix, wherein c is the second preset number, the c-1 image categories are different from the difficult example category seeds, and c is a positive integer;

and taking the difficult example category seed and the c-1 image categories as the second preset number of difficult example categories.

5. The training method of claim 4, wherein the obtaining an image class from the training sample set as a hard case class seed required by the current iterative training comprises:

aiming at each row vector of the difficult-to-case matrix, acquiring the total distance of all image categories of the row vector;

respectively carrying out normalization processing on the total distance of each row vector to obtain a loss value corresponding to each image category;

respectively inputting the loss value corresponding to each image category into a preset nonlinear function to obtain the sampling probability of each image category;

randomly extracting an image class from a predetermined set in an unreplaceable manner to serve as a difficult example class seed required by the iterative training, wherein the predetermined set comprises each image class in the training sample set, and the probability of each image class being acquired in the predetermined set is equal to the sampling probability of the corresponding image class.

6. The training method of claim 4, wherein said obtaining c-1 image classes corresponding to said difficult-case class seeds from said difficult-case matrix comprises:

based on the difficult example category seeds, c-1 image categories are obtained in the difficult example matrix in a depth-first search or width-first search mode.

7. The training method according to claim 3, wherein the acquiring, from the training sample set, effective training images corresponding to the second predetermined number of difficult example categories based on the second predetermined number of difficult example categories as the training images required for the current iterative training includes:

and respectively acquiring m training images from the training sample set aiming at each image category in the second predetermined number of difficult example categories, and taking all the acquired training images as the training images required by the iterative training.

8. The training method of claim 2, further comprising:

and after each iterative training of the image category identification model is completed, updating the feature vector of the image category corresponding to the training image of the iterative training.

9. The training method of claim 8, further comprising:

and after one training of the image category identification model is completed, reconstructing a hard case matrix based on the updated feature vector of each image category, wherein the one training comprises a plurality of times of iterative training.

10. The training method of claim 1, further comprising:

after the training of the image category identification model for the preset times is completed, reconstructing a hard case matrix based on the training sample set, wherein each training in the training of the image category identification model for the preset times comprises a plurality of times of iterative training.

11. The training method according to claim 1, wherein after training the image class recognition model to be trained based on the effective training image and the actual class corresponding to the effective training image to obtain a trained image class recognition model, the method further comprises:

acquiring an image to be retrieved;

inputting the image to be retrieved into a trained image category identification model, and acquiring a feature vector of the image to be retrieved output by a middle layer of the trained image category identification model;

determining the distance between the image to be retrieved and each image based on the feature vector of the image to be retrieved and the feature vector of each image in an image library;

and determining the image with the distance meeting the preset relation with the preset threshold value as the image of the same category as the image to be retrieved.

12. The training method according to claim 1, wherein after training the image class recognition model to be trained based on the effective training image and the actual class corresponding to the effective training image to obtain a trained image class recognition model, the method further comprises:

acquiring an image to be retrieved;

inputting the image to be retrieved into a trained image category identification model to obtain the image category of the image to be retrieved;

and acquiring the images with the same image category from an image library.

13. The training method according to claim 1, wherein the determining, based on the image classes included in the training images, the distance between each two image classes, and determining, as the respective difficult-to-case class, each two image classes for which the distance satisfies a preset relationship with a preset threshold value, includes at least one of:

determining cosine distances between every two image categories based on the image categories contained in the training images, and determining every two image categories of which the cosine distances are larger than the preset threshold value as respective difficult-case categories;

determining Euclidean distance between every two image categories based on the image categories contained in the training images, and determining every two image categories with the Euclidean distance smaller than the preset threshold value as respective difficult-case categories.

14. An image retrieval method, comprising:

acquiring an image to be retrieved;

inputting the image to be retrieved into a trained image category identification model, and obtaining a feature vector of the image to be retrieved output by a middle layer of the trained image category identification model, wherein the image category identification model is obtained by training effective training images corresponding to difficult cases in a training sample set, and the difficult cases are determined by the relationship between the distance between every two image categories in the image categories contained in a plurality of training images in the training sample set and a preset threshold;

determining the image with the distance meeting the preset relation with the preset threshold value as the image of the same category as the image to be retrieved;

the effective training images corresponding to the difficult example categories in the training sample set are obtained in the following mode:

15. The image retrieval method of claim 14, wherein the distance between each two image classes in the image classes contained in the plurality of training images in the training sample set is determined by:

16. The image retrieval method of claim 14, wherein the obtaining of the valid training images corresponding to the difficult case category from the set of training samples based on the difficult case category of the difficult case matrix comprises:

17. The image retrieval method of claim 16, wherein the obtaining a second predetermined number of difficult cases from the difficult case matrix based on one image category in the training sample set, the second predetermined number of difficult case categories corresponding to the one image category, comprises:

18. The image retrieval method of claim 17, wherein the obtaining of one image class from the training sample set as a difficult case class seed required by the current iterative training comprises:

19. The image retrieval method of claim 17, wherein the obtaining c-1 image categories corresponding to the difficult-case-category seed from the difficult-case matrix comprises:

20. The image retrieval method according to claim 16, wherein the acquiring, from the training sample set, effective training images corresponding to the second predetermined number of difficult example categories based on the second predetermined number of difficult example categories as training images required for the current iterative training includes:

21. The image retrieval method according to claim 15, further comprising:

22. The image retrieval method according to claim 21, further comprising:

23. The image retrieval method according to claim 14, further comprising:

24. An apparatus for training an image classification recognition model, comprising:

a first acquisition unit configured to acquire a training sample set, wherein the training sample set includes a plurality of training images;

an image category determination unit configured to determine an image category included in the plurality of training images;

a difficult case category determination unit configured to determine a distance between each two image categories based on image categories included in the plurality of training images, and determine each two image categories, for which the distance and a preset threshold satisfy a preset relationship, as respective difficult case categories;

a second acquisition unit configured to acquire, from the plurality of training images, a valid training image corresponding to the difficult case category based on the difficult case category;

the training unit is configured to train an image category identification model to be trained on the basis of the effective training images and the actual categories corresponding to the effective training images to obtain a trained image category identification model;

the second obtaining unit is further configured to sort, for each image category in the training sample set, the difficult-to-case categories of the current image category by distance to obtain a difficult-to-case category vector of the current image category; taking the difficult case category vector of each image category as a row vector, and constructing a difficult case matrix of the training sample set; and acquiring a valid training image corresponding to the difficult case category from the plurality of training images based on the difficult case category in the difficult case matrix.

25. The training apparatus according to claim 24, wherein the hard case category determination unit is further configured to acquire m training images for each of image categories included in the plurality of training images, where m is a positive integer; inputting the m training images into an image category identification model aiming at the m training images of each image category, and acquiring the feature vectors of the m training images output by the middle layer of the image category identification model; determining the average feature vector of the feature vectors of the m training images as the feature vector of the corresponding image category; determining a distance between each two image categories based on the feature vector of each image category.

26. The training apparatus according to claim 24, wherein the second obtaining unit is further configured to, for each iterative training, obtain a second predetermined number of difficult example categories corresponding to one image category from the difficult example matrix based on the one image category in the training sample set, and obtain, as the training images required for the current iterative training, valid training images corresponding to the second predetermined number of difficult example categories from the training sample set based on the second predetermined number of difficult example categories.

27. The training apparatus according to claim 26, wherein the second obtaining unit is further configured to obtain one image class from the training sample set as a difficult example class seed required by the current iterative training; c-1 image categories corresponding to the difficult example category seeds are obtained from the difficult example matrix, wherein c is the second preset number, the c-1 image categories are different from the difficult example category seeds, and c is a positive integer; and taking the difficult example category seed and the c-1 image categories as the second preset number of difficult example categories.

28. The training apparatus of claim 27, wherein the second obtaining unit is further configured to obtain, for each row vector of the difficult-to-case matrix, a total distance of all classes of row vectors; respectively carrying out normalization processing on the total distance of each row vector to obtain a loss value corresponding to each image category; respectively inputting the loss value corresponding to each image category into a preset nonlinear function to obtain the sampling probability of each image category; randomly extracting an image class from a predetermined set in an unreplaceable manner to serve as a difficult example class seed required by the iterative training, wherein the predetermined set comprises each image class in the training sample set, and the probability of each image class being acquired in the predetermined set is equal to the sampling probability of the corresponding image class.

29. The training apparatus of claim 28, wherein the second obtaining unit is further configured to obtain c-1 image classes in the difficult example matrix in a depth-first search or a width-first search based on the difficult example class seed.

30. The training apparatus according to claim 26, wherein the second obtaining unit is further configured to collect m training images from the training sample set for each image class of the second predetermined number of difficult example classes, and use all the collected training images as training images required by the current iterative training.

31. The training apparatus of claim 25, further comprising:

and the updating unit is configured to update the feature vector of the image category corresponding to the training image of the iterative training after each iterative training of the image category identification model is completed.

32. The training apparatus of claim 31, wherein the updating unit is further configured to reconstruct a hard case matrix based on the updated feature vector of each image class after completing one training of the image class recognition model, wherein one training comprises a plurality of iterative training.

33. The training apparatus of claim 24, wherein the updating unit is further configured to reconstruct a hard case matrix based on the training sample set after a predetermined number of times of training of the image class recognition model is completed, wherein each of the predetermined number of times of training of the image class recognition model comprises a plurality of iterative training.

34. The training device of claim 24, wherein the training device further comprises:

a third acquisition unit configured to acquire an image to be retrieved;

the fourth acquisition unit is configured to input the image to be retrieved into a trained image category identification model, and acquire a feature vector of the image to be retrieved output by a middle layer of the trained image category identification model;

the distance determining unit is configured to determine the distance between the image to be retrieved and each image based on the feature vector of the image to be retrieved and the feature vector of each image in an image library;

and the image determining unit is configured to determine the image of which the distance and the preset threshold value meet the preset relation as the image of the same category as the image to be retrieved.

35. The training device of claim 24, wherein the training device further comprises:

a fifth acquiring unit configured to acquire an image to be retrieved;

a sixth obtaining unit, configured to input the image to be retrieved to a trained image category identification model, so as to obtain an image category of the image to be retrieved;

a seventh acquiring unit configured to acquire an image of the same category as the image from an image library.

36. The training apparatus of claim 24, wherein the difficult-to-case category determination unit is further configured to at least one of: determining cosine distances between every two image categories based on the image categories contained in the training images, and determining every two image categories of which the cosine distances are larger than the preset threshold value as respective difficult-case categories; determining Euclidean distance between every two image categories based on the image categories contained in the training images, and determining every two image categories with the Euclidean distance smaller than the preset threshold value as respective difficult-case categories.

37. An image retrieval apparatus, comprising:

a first acquisition unit configured to acquire an image to be retrieved;

the second acquisition unit is configured to input the image to be retrieved into a trained image category identification model, and acquire a feature vector of the image to be retrieved output by an intermediate layer of the trained image category identification model, wherein the image category identification model is obtained by effective training image training corresponding to a difficult case category in a training sample set, and the difficult case category is determined by a relation between a distance between every two image categories in image categories contained in a plurality of training images in the training sample set and a preset threshold;

the image determining unit is configured to determine the image of which the distance and the preset threshold value meet the preset relation as the image of the same category as the image to be retrieved;

a training unit configured to acquire valid training images corresponding to difficult cases in the training sample set by: for each image category in the training sample set, sorting the difficult case categories of the current image category according to distance to obtain difficult case category vectors of the current image category; taking the difficult case category vector of each image category as a row vector, and constructing a difficult case matrix of the training sample set; and acquiring a valid training image corresponding to the difficult case category from the plurality of training images based on the difficult case category in the difficult case matrix.

38. The image retrieval device of claim 37, further comprising a training unit configured to determine a distance between each two image classes of the image classes contained in the plurality of training images in the training sample set by: acquiring m training images of each image category in the image categories contained in the training images, wherein m is a positive integer; inputting the m training images into an image category identification model aiming at the m training images of each image category, and acquiring the feature vectors of the m training images output by the middle layer of the image category identification model; determining the average feature vector of the feature vectors of the m training images as the feature vector of the corresponding image category; determining a distance between each two image categories based on the feature vector of each image category.

39. The image retrieval device according to claim 37, wherein the training unit is further configured to, for each iterative training, acquire a second predetermined number of difficult case categories corresponding to one image category from the difficult case matrix based on the one image category in the training sample set, and acquire, as the training images required for the current iterative training, valid training images corresponding to the second predetermined number of difficult case categories from the training sample set based on the second predetermined number of difficult case categories.

40. The image retrieval device of claim 39, wherein the training unit is further configured to obtain an image class from the training sample set as a difficult example class seed required by the current iterative training; c-1 image categories corresponding to the difficult example category seeds are obtained from the difficult example matrix, wherein c is the second preset number, the c-1 image categories are different from the difficult example category seeds, and c is a positive integer; and taking the difficult example category seed and the c-1 image categories as the second preset number of difficult example categories.

41. The image retrieval device of claim 40, wherein the training unit is further configured to, for each row vector of the difficult-to-case matrix, obtain a total distance of all image classes of a row vector; respectively carrying out normalization processing on the total distance of each row vector to obtain a loss value corresponding to each image category; respectively inputting the loss value corresponding to each image category into a preset nonlinear function to obtain the sampling probability of each image category; randomly extracting an image class from a predetermined set in an unreplaceable manner to serve as a difficult example class seed required by the iterative training, wherein the predetermined set comprises each image class in the training sample set, and the probability of each image class being acquired in the predetermined set is equal to the sampling probability of the corresponding image class.

42. The image retrieval device of claim 40, wherein the training unit is further configured to obtain c-1 image classes in the difficult case matrix based on the difficult case class seed in a depth-first search or a width-first search.

43. The image retrieval device of claim 39, wherein the training unit is further configured to collect m training images from the training sample set for each image class of the second predetermined number of difficult example classes, and to use all the collected training images as training images required by the current iterative training.

44. The image retrieval device according to claim 38, further comprising:

45. The image retrieval device of claim 44, wherein the updating unit is configured to reconstruct a hard case matrix based on the updated feature vector of each image class after completing one training of the image class identification model, wherein one training comprises a plurality of iterative training.

46. The image retrieval device of claim 37, wherein the updating unit is configured to reconstruct a hard case matrix based on the training sample set after a predetermined number of times of training of the image category identification model is completed, wherein each training of the predetermined number of times of training of the image category identification model comprises a plurality of iterative training.

47. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the training method of the image class recognition model according to any one of claims 1 to 13 and/or the image retrieval method according to any one of claims 14 to 23.

48. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by at least one processor, cause the at least one processor to perform a training method of an image class recognition model according to any one of claims 1 to 13 and/or an image retrieval method according to any one of claims 14 to 23.