CN111858999B

CN111858999B - Retrieval method and device based on segmentation difficult sample generation

Info

Publication number: CN111858999B
Application number: CN202010586972.9A
Authority: CN
Inventors: 祝闯; 董慧慧; 齐勇刚; 刘军; 刘芳
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2020-06-24
Filing date: 2020-06-24
Publication date: 2022-10-25
Anticipated expiration: 2040-06-24
Also published as: CN111858999A

Abstract

The embodiment of the invention provides a retrieval method and a retrieval device based on difficult segmented sample generation, wherein the method comprises the following steps: using all samples in the sample set of the original ternary image group, and increasing the difficulty degree of each group of original ternary image group in the sample set of the original ternary image group; in the first stage of the THSG, the difficulty degree of the positive sample pair is increased, the difficulty positive sample pair is obtained, meanwhile, the consistency of the labels of the difficulty positive sample pair and the labels of the original positive sample pair is ensured, in the second stage, the difficulty degree of the original negative sample is increased, the final difficulty negative sample and the final difficulty positive sample pair are obtained, and the effective usability of the sample set is improved. Further, using the final difficult ternary sample group, effective difficult samples can be supplemented for fewer training sets, thereby enabling the model to be trained better. Meanwhile, a more robust and robust feature extraction retrieval model is obtained by using the difficult sample pair training.

Description

Retrieval method and device based on segmentation difficult sample generation

Technical Field

The invention relates to the technical field of image processing, in particular to a retrieval method and a retrieval device based on difficult segmented sample generation.

Background

The Deep Metric Learning (DML) method aims to learn a powerful Metric criterion to accurately and robustly measure the similarity between data. At present, the development of DML enables its wide application in various fields, such as image retrieval, human re-identification, clustering, and other multimedia task fields.

The above image search is taken as an example for explanation. At present, there are various image retrieval methods based on DML, and there is mainly a model construction method based on metric learning, and in metric learning, multiple sets of ternary image group samples are used as input of a construction model, where each set of ternary image group samples is composed of a pair of positive samples with the same label and negative samples with different labels from the positive samples. However, in some small-scale datasets, the number of triplet group samples that can be constructed is limited. For example, in the process of retrieving images of wild animals, because the image data volume of some wild rare animals is small, the number of ternary image group samples of the wild rare animals constructed by the method is too small, so that the model cannot be effectively trained, and the effectiveness of retrieving images of animals is further reduced.

In a word, in some small-scale data sets, the number of samples of the ternary image group which can be constructed is limited, so that the model cannot be effectively trained, and the retrieval effectiveness is reduced.

Disclosure of Invention

The embodiment of the invention aims to provide a retrieval method and a retrieval device based on difficult segmented sample generation, which are used for solving the technical problems that in the prior art, in some small-scale data sets, the number of samples of a ternary image group which can be constructed is limited, so that a model cannot be trained effectively, and the retrieval effectiveness is reduced. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides a retrieval method based on generation of a segmentation-difficult sample, including:

extracting the characteristics of the image to be retrieved;

taking the characteristics of the image to be retrieved as the input of a retrieval model, and obtaining a retrieval result related to the image to be retrieved and a distance score between the characteristics of the image to be retrieved and the characteristics of the image in a retrieval model database through the retrieval model; the retrieval model is obtained by training based on an original ternary image group serving as a sample set and a final difficult ternary sample group obtained by a two-stage difficult sample generation framework THSG; in the first stage of the THSG, increasing difficulty degree to original positive sample pairs in an original ternary image group to obtain difficult positive sample pairs; adjusting the label of the difficult positive sample pair to be consistent with the label of the original positive sample pair, and outputting the adjusted difficult positive sample pair and the original negative sample in the original ternary image group to a second stage of the THSG; in the second stage of the THSG, increasing the difficulty degree of the original negative sample to obtain a final difficult negative sample and a final difficult positive sample pair; synthesizing the final difficult positive sample pair and the final difficult negative sample to obtain a final difficult ternary sample group;

and sequencing the retrieval results related to the images to be retrieved according to the distance scores to obtain the retrieval result most related to the images to be retrieved.

Further, the extracting the features of the image to be retrieved includes:

extracting the characteristics of the animal image to be retrieved;

the step of taking the features of the image to be retrieved as the input of a retrieval model, obtaining a retrieval result related to the image to be retrieved and a distance score between the features of the image to be retrieved and the features of the image in the retrieval model database through the retrieval model, comprises the following steps:

taking the characteristics of the animal image to be retrieved as the input of a retrieval model, and obtaining a retrieval result related to the animal image to be retrieved and a distance score between the characteristics of the animal image to be retrieved and the characteristics of the images in the retrieval model database through the retrieval model;

the step of sorting the retrieval results related to the images to be retrieved according to the distance scores to obtain the most relevant retrieval result of the images to be retrieved comprises the following steps:

and sequencing the retrieval results related to the images to be retrieved according to the distance scores to obtain the animal retrieval result most related to the animal images to be retrieved.

Further, the retrieval model is obtained through the following steps:

acquiring an original ternary image group serving as a sample set;

in the first stage of the two-stage difficult sample generation frame THSG, stretching the original positive sample pair in a piecewise linear stretching PLM mode to increase the difficulty degree and obtain a difficult positive sample pair; wherein the pair of difficult positive samples comprises: difficult candidate samples and difficult positive samples;

a second stage of adjusting the labels of the difficult positive sample pairs to be consistent with the labels of the original positive sample pairs based on the trained first generation pairing-resistance neural network, outputting the adjusted difficult positive sample pairs and the original negative samples to the THSG; wherein the trained first generated antagonistic neural network comprises: a difficult positive sample pair generator HAPG and a discriminator HAPD corresponding to the HAPG;

in the second stage of the THSG, based on a trained second generation antagonistic neural network, increasing the difficulty degree of the original negative sample to obtain a final difficult negative sample, and outputting a final difficult positive sample pair; wherein the trained second generative antagonistic neural network comprises: a difficult ternary sample generator HTG and a discriminator HTD corresponding to the HTG;

synthesizing the final difficult positive sample pair and the final difficult negative sample to obtain a final difficult ternary sample group;

and (5) taking the final difficult ternary sample group as a sample set, training a convolutional neural network, and obtaining the retrieval model.

Further, the second stage of adjusting the labels of the pair of difficult positive samples to be consistent with the labels of the pair of original positive samples and outputting the pair of adjusted difficult positive samples and the pair of original negative samples to the THSG based on the trained first generation of the antagonistic neural network includes:

adjusting labels of the pairs of difficult positive samples to be consistent with labels of the pairs of original positive samples based on a trained first generation paired anti-neural network and a trained third generation paired anti-neural network, and outputting the adjusted pairs of difficult positive samples and the original negative samples to a second stage of the THSG, wherein the trained third generation paired anti-neural network comprises: a reconstruction condition generator RCG and a discriminator RCD corresponding to the RCG.

Further, in the first stage of the two-stage difficult sample generation framework THSG, the original positive sample pair is stretched in a piecewise linear stretching PLM manner, and the difficulty level is increased, so as to obtain a difficult positive sample pair, including:

and stretching the original positive sample pair by adopting a piecewise linear operation formula in a piecewise linear stretching PLM mode, increasing the difficulty degree and obtaining a difficult positive sample pair, wherein the piecewise linear operation formula comprises the following steps:

a ^* ＝a+λ(a-p)

p ^* ＝p+λ(p-a)

wherein, a ^* For difficult candidate samples, a is the original candidate sample, λ is the stretch distance coefficient, p is the original positive sample, p ^* For difficult positive samples, α is the bias hyperparameter, d ₀ Is a segmentation coefficient, d (a, p) is the distance between the original candidate sample a and the original positive sample p, and gamma is a linear hyperparameter;

or, stretching the original positive sample pair by adopting a preferred piecewise linear operation formula in a piecewise linear stretching PLM mode, increasing the difficulty level, and obtaining a difficult positive sample pair, wherein the preferred piecewise linear operation formula comprises:

wherein d is _epoch-1 The average distance of the positive sample pairs calculated in the last training process is calculated, wherein the positive sample pairs comprise: the last time is the original positive sample during the first training process, and the last time is the difficult positive sample pair during the non-first training process.

Further, the RCD and the HAPD are determined using the following equations, different from the HAPD input and output:

wherein, the first and the second end of the pipe are connected with each other,

to generate a signature of the sample after passing through the discriminator, R (x' _i ) For the output of the input data after passing through the discriminator,

is the ith generation sample, x' _i For the ith input sample, the sample is,

to normalize the softmax class loss,

as a function of the loss of the discriminator.

Further, the HAPG is obtained by the following function:

as a function of the loss of the HAPG,

as a HAPD class impairmentIn the light of the above-mentioned problems,

for HAPG class loss, cls is a class,

to normalize exponential function class loss, HAPD (x' _i ) Output of the generated difficult positive samples through HAPD, C _HAPG (x′ _i ) Is a category, x ', into which the output of the HAPG is classified' _i A difficult sample is generated for the ith.

Further, the RCG is determined by using the following RCG loss formula, wherein the RCG loss formula is as follows:

for the value of the RCG loss function,

is the L2 distance between the pre-reconstruction sample and the post-reconstruction sample, and eta is a balance factor between the normalized exponential function and the reconstruction loss,

for reconstruction of condition discriminator losses, cls is class characterization, C _RCG In order to reconstruct the condition generator losses,

reconstruction Condition Generator loss concrete form, x, for RCG ^r The reconstructed vector of the RCG passed through for the generated samples of HAPG, x being the original vector,

to normalize the class loss of the exponential function, sm is short for normalizing the exponential function,

for the ith reconstructed specific vector,

output of the discriminator for reconstructing the backward quantity,/ _i Is the ith category label, i is the serial number.

Further, the generating an antagonistic neural network based on the trained second generation, adding a difficulty degree to the original negative sample, and obtaining a final difficult negative sample, including:

based on the trained second generation countermeasure neural network, adopting a self-adaptive inversion triplet state loss formula to increase the difficulty degree of the original negative sample to obtain a final difficult negative sample, wherein the self-adaptive inversion triplet state loss formula is as follows:

wherein the content of the first and second substances,

is a loss function of the HTG, eta is a balance factor between the normalized exponential function and the reconstruction loss, mu is a reconstruction loss balance parameter,

in order to adapt to the inversion loss in an adaptive manner,

in order to reconstruct the losses,

in order to classify the loss for the HTD,

in order to classify the loss for the HTG,

a candidate sample generated for the HTG is,

for the positive samples generated by the HTG,

negative samples generated for HTG, a being the original candidate sample,

for reconstruction losses, p is the original positive sample,

in order to classify the loss in question,

generation of a sample collective term for HTG,/ _i As class label, C _HTG In order to be a function of the HTG loss,

in order to reverse the triplet loss,

in order to reverse the triplet losses in the light of,

wherein, a' is the input of the positive sample,

n is the input of the negative sample,

p' is the input of the positive sample,

l2 distance, [.] ₊ To cut off from 0, τ _r In order to reverse the triplet state loss hyperparameter,

v is a constant and the value range of v is from 0 to plus infinity, β is a constant and the value range of β is from 0 to plus infinity,

is a loss function of the HTG;

the HTD is obtained using the following equation:

wherein the content of the first and second substances,

the value of the HTD penalty function, C the original class number,

to be composed of

To input the results obtained by said HTD, HTD (x) _i ) Is the result of the HTD with the original sample x as input.

In a second aspect, an embodiment of the present invention provides a retrieval apparatus based on difficult-to-segment sample generation, including:

the extraction module is used for extracting the characteristics of the image to be retrieved;

the processing module is used for taking the characteristics of the image to be retrieved as the input of a retrieval model, and obtaining a retrieval result related to the image to be retrieved and a distance score between the characteristics of the image to be retrieved and the characteristics of the image in the retrieval model database through the retrieval model; the retrieval model is obtained by training based on an original ternary image group serving as a sample set and a final difficult ternary sample group obtained by a two-stage difficult sample generation frame THSG; the final difficult ternary sample group is obtained by increasing the difficulty degree of an original positive sample pair in an original ternary sample group in the first stage of the THSG; adjusting the label of the difficult positive sample pair to be consistent with the label of the original positive sample pair, and outputting the adjusted difficult positive sample pair and the original negative sample in the original ternary image group to a second stage of the THSG; in the second stage of the THSG, increasing the difficulty degree of the original negative sample to obtain a final difficult negative sample and a final difficult positive sample pair; synthesizing the final difficult positive sample pair and the final difficult negative sample to obtain a final difficult ternary sample group;

and the sorting module is used for sorting the retrieval results related to the images to be retrieved according to the distance scores to obtain the retrieval result most related to the images to be retrieved.

In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;

a memory for storing a computer program;

a processor for implementing the steps of the method of any one of the first aspect when executing a program stored in the memory.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to perform the method of any one of the above first aspects.

The embodiment of the invention has the following beneficial effects:

compared with the prior art, the retrieval method and the retrieval device based on the difficult-to-segment sample generation have the advantages that all samples in the sample set of the original ternary image group are used, and the difficulty degree of each original ternary image group in the sample set of the original ternary image group is increased; in the first stage of the THSG, the difficulty degree of the positive sample pair is increased, the difficulty positive sample pair is obtained, meanwhile, the consistency of the labels of the difficulty positive sample pair and the labels of the original positive sample pair is ensured, in the second stage, the difficulty degree of the original negative sample is increased, the final difficulty negative sample and the final difficulty positive sample pair are obtained, and the effective usability of the sample set is improved. Further, using the final difficult ternary sample set, effective difficult samples can be supplemented for fewer training sets, thereby enabling the model to be trained better. Meanwhile, a more robust and robust feature extraction retrieval model is obtained by using the difficult sample pair training.

Of course, it is not necessary for any product or method to achieve all of the above-described advantages at the same time for practicing the invention.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a first flowchart of a retrieval method based on segmentation difficulty sample generation according to an embodiment of the present invention;

fig. 2 is a second flowchart of a retrieval method based on generation of a segmentation-difficult sample according to an embodiment of the present invention;

fig. 3 is a third flow chart of the retrieval method based on the generation of the segmentation-difficult sample according to the embodiment of the present invention;

FIG. 4 is a diagram illustrating a fourth flowchart of a retrieval method based on segmentation difficulty sample generation according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a retrieval apparatus based on segmentation difficulty sample generation according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

First, for convenience of understanding the embodiments of the present invention, the following terms "image to be retrieved", "original triplet set", "original candidate sample", "original positive sample pair", "original negative sample", "difficult candidate sample", "difficult positive sample pair", "difficult negative sample", "final difficult triplet set", "adjusted difficult positive sample pair", "final difficult positive sample pair", "retrieval result related to image to be retrieved", and "retrieval result most related to image to be retrieved" in the embodiments of the present invention are used first.

The retrieval model database in the images to be retrieved of the images to be detected and the retrieval model database in the images in the retrieval model database are used for distinguishing the two images. The message to be retrieved is a candidate image which is not used for retrieval, and the acquisition mode of the image to be retrieved is not limited, and the image to be retrieved can be obtained by shooting or pre-stored, so that the message to be retrieved needs to be detected by using a detection model, and the content of the image to be retrieved, which needs to be retrieved, can be known. And the images in the retrieval model database are used as the generation basis of the final difficult ternary sample group of the detector, and the images in the retrieval model database have labels which are used for representing specific type labels of the images in the retrieval model database. Using the original message, a final difficult ternary sample group obtained by training based on an original ternary image group as a sample set and a Two-Stage difficult sample Generation framework (THSG) is trained, and a feature extraction model, that is, a trained Convolutional Neural Network (CNN) is obtained by training the final difficult ternary sample group, so that the feature extraction model can learn the features of the images in the search model database. Such a trained CNN may be referred to as a retrieval model. In this regard, the convolutional neural network before being untrained may be referred to as a search model to be trained.

The "original" in the original candidate sample, "original" in the original positive sample, and "original" in the original negative sample, and "original" in the "original positive sample pair," difficulty "in the" difficult candidate sample, "difficulty" in the "difficult positive sample pair," difficulty "in the" difficult negative sample, "difficulty" in the "adjusted difficulty positive sample pair," and "final difficulty" in the "final difficult positive sample pair" are to distinguish the respective samples. By pair is meant two samples of the same type of label. And the original triplet, the original candidate sample, the original positive sample pair, and the original negative sample may be collectively referred to as the original sample. The difficult candidate samples, the difficult positive sample pairs, the difficult negative samples, the final difficult ternary sample set, the adjusted difficult positive sample pairs, and the final difficult positive sample pairs may be collectively referred to as difficult samples.

And, each set of original triplet includes: an original positive exemplar pair consisting of the original candidate exemplar, the original positive exemplar co-labeled with the original candidate exemplar, and an original negative exemplar that is different from the original candidate exemplar label. Each difficult triplet set includes: a difficult positive exemplar pair consisting of a difficult candidate exemplar, a difficult positive exemplar co-labeled with the difficult candidate exemplar, and a difficult negative exemplar that is different from the difficult candidate exemplar label.

The correlation of the retrieval results related to the image to be retrieved and the most correlation of the retrieval results related to the image to be retrieved are used for distinguishing the two retrieval results, and the retrieval result related to the image to be retrieved is determined from the retrieval results related to the image to be detected.

In order to grasp the idea of the embodiment of the present invention as a whole, the following briefly introduces the following overall implementation process: as shown in fig. 1, the path shown by the thin black line is used to train the CNN to obtain a trained CNN, for example, a sample set is obtained, based on the sample set, a final difficult ternary sample group obtained by training the frame THSG is generated by using two stages of difficult samples, and the CNN is trained in an auxiliary manner to obtain the trained CNN. And taking the trained CNN as a detection model, acquiring an image to be detected on the other path as shown by a thick black line in the figure 1, extracting the characteristics of the image to be detected, and finally outputting a retrieval result most relevant to the image to be detected through the detection model. The details will be described below.

Aiming at the problem that in some small-scale data sets in the prior art, the number of samples of the ternary image group which can be constructed is limited, so that a model cannot be trained effectively, and the retrieval effectiveness is reduced, the embodiment of the invention provides a retrieval method and a retrieval device based on segmentation difficulty sample generation, wherein all samples in the sample set of the original ternary image group are used, and the difficulty degree is increased for each group of original ternary image group in the sample set of the original ternary image group; in the first stage of the THSG, the difficulty degree of the positive sample pair is increased, the difficulty positive sample pair is obtained, meanwhile, the consistency of the label of the difficulty positive sample pair and the label of the original positive sample pair is ensured, in the second stage, the difficulty degree of the original negative sample is increased, the difficulty negative sample is obtained, and the effective usability of the sample set is improved. Furthermore, the generated difficult ternary sample group is used as a sample set, so that effective difficult samples can be supplemented for fewer training sets, and the model can be trained better. Meanwhile, a more robust and robust feature extraction retrieval model is obtained by using the difficult sample pair training.

First, a search method based on difficult segmented sample generation provided by the embodiment of the present invention is described below.

The retrieval method based on the segmentation difficulty sample generation provided by the embodiment of the invention is applied to scenes such as personnel images or animal images. Furthermore, the DML can be applied to multimedia task scenes such as visual product retrieval, zero-lens image retrieval, highlight image detection, face image retrieval and the like. This may be done for DML purposes, keeping similar examples in close position and keeping dissimilar examples away from each other.

As shown in fig. 2, a retrieval method based on generation of a segmentation-difficult sample according to an embodiment of the present invention may include the following steps:

and 110, extracting the characteristics of the image to be retrieved. Thus, the characteristics which are not identified by the retrieval model can be extracted from the image to be retrieved.

Step 120, using the features of the image to be retrieved as the input of a retrieval model, and obtaining a retrieval result related to the image to be retrieved and a distance score between the features of the image to be retrieved and the features of the image in the retrieval model database through the retrieval model; the retrieval model is obtained by training based on an original ternary image group serving as a sample set and a final difficult ternary sample group obtained through THSG; the final difficult ternary sample group is obtained by increasing the difficulty degree of an original positive sample pair in an original ternary sample group in the first stage of the THSG; adjusting the label of the difficult positive sample pair to be consistent with the label of the original positive sample pair, and outputting the adjusted difficult positive sample pair and the original negative sample in the original ternary image group to a second stage of the THSG; in the second stage of the THSG, increasing the difficulty degree of the original negative sample to obtain a final difficult negative sample and a final difficult positive sample pair; and synthesizing the final difficult positive sample pair and the final difficult negative sample to obtain a final difficult ternary sample set. This is based on the constraint of the last equation, equation 13 below, that is, letting negative examples be closer to the candidate, the better, and thus the harder the better.

And step 130, sorting the retrieval results related to the images to be retrieved according to the distance scores to obtain the retrieval result most related to the images to be retrieved.

In order to facilitate better sorting and obtain the retrieval result most relevant to the image to be retrieved, the retrieval results relevant to the image to be retrieved can be sorted according to the sequence of the distance scores from high to low; taking the N bits in the front of the sequence as the most relevant retrieval result of the image to be retrieved; or sorting the retrieval results related to the images to be retrieved according to the sequence of the distance scores from low to high; taking the N bits in the later sequence as the most relevant retrieval result of the image to be retrieved; specifically, the N bits may be determined according to user requirements, and generally, N may refer to 10, which is not limited herein.

Based on the above description, in the study of the wild animals, the pictures of the similar animals are retrieved according to the taken pictures to perform the study on the aspects of the life habit, the action track, the regional distribution and the like of the animals, and especially in the study of the wild rare animals, for example, the protection of the wild rare animals such as wild pandas, and even the protection of the wild endangered animals such as antelopes, the wild animals need to be identified or retrieved, so that the efficiency of the study is effectively improved through deep learning. Therefore, better picture feature expression can be obtained when tasks such as image retrieval and the like are executed in a wild animal picture library, and higher retrieval performance is realized. The embodiment of the invention is described by taking the application to retrieval of wild animal images as an example, for example, extracting the characteristics of the animal images to be retrieved; taking the characteristics of the animal image to be retrieved as the input of a retrieval model, and obtaining a retrieval result related to the animal image to be retrieved and a distance score between the characteristics of the animal image to be retrieved and the characteristics of the images in the retrieval model database through the retrieval model; and sequencing the retrieval results related to the images to be retrieved according to the distance scores to obtain the animal retrieval result most related to the animal images to be retrieved. The search model is obtained by training an original ternary sample group serving as a sample set, wherein the original ternary sample group serving as the sample set refers to an original ternary sample group serving as a sample set from a wild animal image, the training process mode of the search model is the same as the training process mode of the search model from the step 110 to the step 130, and the specific training process mode is the same except for the processed objects, i.e., the objects of the wild animal image and the image, and the training process mode of the search model from the step 110 to the step 130 can be referred to, which is not described herein again.

In the embodiment of the invention, all samples in the sample set of the original ternary image group are used, and the difficulty degree is increased by each group of original ternary image group in the sample set of the original ternary image group; in the first stage of the THSG, the difficulty degree of the positive sample pair is increased, the difficulty positive sample pair is obtained, meanwhile, the consistency of the label of the difficulty positive sample pair and the label of the original positive sample pair is ensured, in the second stage, the difficulty degree of the original negative sample is increased, the difficulty negative sample is obtained, and the effective usability of the sample set is improved. Furthermore, the generated difficulty ternary sample group is used as a sample set, so that effective difficulty samples can be supplemented for fewer training sets, and the model can be trained better. Meanwhile, a more robust and robust feature extraction retrieval model is obtained by using the difficult sample pair training.

It should be noted that, generating the antagonistic neural network includes: the method comprises the steps that firstly, an input sample of a generator is obtained, forward derivation of a neural network of the generator is adopted, a generation sample of the generator is obtained, the input sample of the generator and the generation sample of the generator are used as input data of a discriminator, and the source of the input data of the discriminator is identified to be the input sample of the generator or the generation sample of the generator through classification training of the neural network of the discriminator; if the input data of the discriminator is identified as a generation sample of the generator, returning gradient information generated by the discriminator and related to the generation sample of the generator to the generator; and adjusting the neural network of the generator by using the first gradient information, updating the neural network of the generator by using the adjusted neural network of the generator, returning to the first step for continuous execution until the classification training of the neural network of the discriminator is performed, identifying the source of the input data of the discriminator as the input sample of the generator, and outputting the current generated sample.

Based on the foregoing generation of the antagonistic neural network, in the process of obtaining the search model, there are various ways to obtain the adjusted hard positive sample pair, for example, one possible implementation manner is: based on a generated antagonistic neural network, the labels of the adjusted difficult positive sample pairs are consistent with the labels of the original positive sample pairs, the adjusted difficult positive sample pairs are output, and the original negative samples are output to the second stage of the THSG. In order to be able to make the diversity of the output pairs of the adjusted pairs of the difficult positive samples, based on two generating antagonistic neural networks, the labels of the pairs of the difficult positive samples are consistent with the labels of the pairs of the original positive samples, the pairs of the adjusted difficult positive samples are output, and the second stage from the original negative samples to the THSG, wherein the output of the other neural network is optimized by one of the two generating antagonistic neural networks, so that the diversity of the pairs of the adjusted difficult positive samples is obtained.

In combination with the above-mentioned manner of obtaining the adjusted difficult positive sample pairs, how to obtain the search model is described in detail below.

Referring to fig. 4, step 1, an original triplet set is obtained as a sample set.

The step 1 includes: taking the original ternary image group as an input image in a feature extraction network F, wherein the input image has a label, the input image with the label is called an input sample, and the feature of the input sample is extracted by the following formula 1, so that the feature of the extracted input sample can be used as an input in a first stage of the THSG and used for stretching an original positive sample pair by adopting a Piecewise Linear stretching (PLM) mode. The feature extraction network F may be CNN, and the above formula 1 is as follows:

wherein the content of the first and second substances,

represents the optimal choice of parameters, m represents the abbreviation of min, i represents the optimal, J represents the global loss function, θ _m Is a parameter of the feature extraction network F,/ _i Is training input sample x _i Corresponding label of, input sample x _i Is I = [ I ] ₁ ，...，I _n ]And, the input image I = [ I = [ ] ₁ ，...，I _n ]Is L = [ L = ₁ ，…，l _i ，…，l _n ]Wherein l is _i ∈[1，…，C]，I ₁ Representing, the first input image, I _n Representing the nth input image, l ₁ Class representing the first input image, l _n Representing the class of the nth input image, n representing the serial number, i.e. nth, l _i The number of categories of the ith input image is represented, i represents the ith input image, and C represents the total number of categories of the input images. Extracting the feature of the input image as F (I) by using the feature extraction network F _i )∈R ^N N represents the feature space dimension, and R represents a real number. Mapping the features F (i) of the input image to X = [ X ] ₁ ，...，X _n ]Wherein X is ₁ Representing the extracted feature, X, of the 1 st input image _n Representing the extracted features of the nth input image. The last layer of the feature extraction network F is the fully connected layer H that performs the spatial mapping _e . The distance loss function is used in order to learn a distance metric in the feature space so that it can reflect the actual semantic distance. To ensure label consistency between the difficult and original exemplars, a final layer in the feature extraction network F, the fully connected layer H _e After which a full-link layer H is added _c Classification is performed and the layer is trained by a normalized softmax loss function. Reuse of H is employed in generating samples _c Layers, performing distance metric loss to train θ as in equation 1 above _m 。

Based on the foregoing, the following continues to illustrate the enhancement of the training process by the generation of resistant difficult samples. Embodiments of the present invention train the generator and distance metric network simultaneously in a competing manner. To obtain more efficient difficult samples, the process of generating difficult samples is divided into two phases, the primary purpose of the first phase of the THSG is to generate difficult candidate samples-positive samples, referred to as difficult positive sample pairs below. The main purpose of the second stage of the THSG is to further improve difficult sample generation. Thus, in the final training phase, the losses of all phases are combined to ensure the performance of the depth metric learning. As shown in fig. 1.

Step 2, in a first stage 11 of the two-stage difficult sample generation frame THSG, stretching an original positive sample pair in a piecewise linear stretching PLM mode to increase the difficulty degree and obtain a difficult positive sample pair; wherein the difficult positive sample pairs include: difficult candidate samples and difficult positive samples.

To obtain the difficult positive sample pairs, the embedded features of the positive sample pairs are linearly stretched, as shown in FIG. 3, so that they deviate in the direction of their center points, generating a difficult candidate sample and positive sample pair a ^* ，p ^* As shown in equation 2. In step 2, in a possible implementation manner, a piecewise linear operation formula in a piecewise linear PLM stretching manner is adopted to stretch the original positive sample pair, so as to increase the difficulty level, and obtain a difficult positive sample pair, where the piecewise linear operation formula includes:

during stretching, if the value of λ is too large, the positive sample pairs may be stretched into other categories. Even if the distance of the thunder sword is increased, the generated sample can only play a negative role in the training process. To ensure that the generated samples are consistent with the labels of the original samples, embodiments of the present invention need to limit the range and size of λ.

Wherein, a ^* For difficult candidate samples, a is the original candidate sample, λ is the stretch distance coefficient, p is the original positive sample, p ^* For difficult positive samples, α is the bias hyperparameter, d ₀ For the segmentation coefficients, d (a, p) is the distance between the original candidate sample a and the original positive sample p, and γ is the linear hyperparameter.

When d (a, p) is already sufficiently large, i.e. larger than d ₀ In time, the embodiment of the present invention uses exponential function formula 3. In this case, λ has a maximum value of α and a minimum value of 0. When the distance between d (a, p) is less than d ₀ In time, embodiments of the present invention use linear function equation 4. At this time, the maximum value of λ is α + d ₀ *γ，The minimum value is alpha. However, in the calculation of γ, d ₀ Related to the distance distribution of the samples in the different data sets, it is therefore difficult to manually adjust among the various data sets. In order to better mine the optimal hyper-parameter d in each dataset ₀ In another possible implementation manner, a preferred piecewise linear operation formula in a piecewise linear PLM manner is adopted to stretch the original positive sample pair, so as to increase the difficulty level, and obtain a difficult positive sample pair, where the preferred piecewise linear operation formula includes:

wherein, d _epoch-1 The average distance of the positive sample pairs calculated in the last training process, wherein the positive sample pairs comprise: the last time is the original positive sample at the time of the first training process, and the last time is the difficult positive sample pair at the time of the non-first training process. After piecewise linear manipulation of the above pairs, embodiments of the present invention yield difficult positive sample pairs. Next, the stretched positive sample pair will be made more efficient using the generator.

Step 3, based on the trained first generation antagonistic neural network, adjusting the labels of the difficult positive sample pairs to be consistent with the labels of the original positive sample pairs, and outputting the adjusted difficult positive sample pairs and the original negative samples to a second stage 12 of THSG; wherein training the first generated antagonistic neural network comprises: a Hard Positive sample pair Generator (Hard antibody-Positive Generator, abbreviated as HAPG) and a Discriminator corresponding to the HAPG, i.e., a Hard Positive sample pair Discriminator (Hard antibody-Positive Discriminator, abbreviated as HAPD); the method comprises the following steps that A1, a hard positive sample pair is used as an input sample of the HAPG, forward derivation of a neural network of the HAPG is adopted to obtain an HAPG generation sample, the input sample of the HAPG and the HAPG generation sample are used as input data of the HAPD, and through two-classification training of the HAPD neural network, the source of the input data of the HAPD is identified to be the HAPG generation sample or the input sample of the HAPG; and B1, if the input data of the HAPD is identified to be the HAPG generated sample, returning first gradient information which is generated by the HAPD and is related to the HAPG generated sample to the HAPG, adjusting the neural network of the HAPG by using the first gradient information, updating the neural network of the HAPG by using the adjusted neural network of the HAPG, returning to the step A1 to continue executing until the two-class training of the HAPD neural network, identifying the source of the input data of the HAPD as the input sample of the HAPG, and taking the current HAPG generated sample as the adjusted difficult positive sample pair.

After segmenting the linearly stretched positive exemplar pairs, to ensure that the generated pairs remain in the same class domain as the original exemplars, embodiments of the present invention require that the exemplars satisfy tag consistency to avoid generating invalid embedded features. However, simple constraints may lead to pattern collapse problems. The embodiment of the present invention introduces HAPG in step 3 above to further adjust the sample. x is a radical of a fluorine atom ^* After passing HAPG, it will be mapped to x', where x ^* May refer to a in fig. 4 ^* And p ^* (ii) a x ' may refer to a ' and p ' in fig. 4.

To better train the HAPG, embodiments of the present invention establish a discriminator HAPD corresponding thereto. HAPD is a two-classifier that identifies whether a given embedded feature is a true feature or a generated feature. Then, HAPD is trained with the loss shown in equation 8 below.

A Discriminator corresponding to the Reconstruction Condition Generator (RCG), that is, a Reconstruction Condition Discriminator (RCD) is different from the HAPD input and output, and the RCD and HAPD are determined by the following formulas:

wherein the content of the first and second substances,

to generate a signature of the sample after passing through the discriminator, R (x' _i ) As an output of the input data after passing through the discriminator,

is the ith generation sample, x' _i For the ith input sample, the sample is,

to normalize the softmax class loss,

as a function of the loss of the discriminator.

HAPD is obtained by the following function:

as a function of the loss of the HAPD,

determining input data for HAPD as a true sample, HAPD (x' _i ) The input data of HAPD is determined to be a false sample for HAPD.

In addition, to ensure the versatility of the classification results, H is mentioned above _c Layer is reused as C _HAPG To distinguish the generated pairs. Then, an embodiment of the present invention trains the HAPG using the loss function equation 8.

The HAPG is obtained by the following function:

as a function of the loss of the HAPG,

is a loss in the HAPD class,

for, HAPG class loss, cls is class,

to normalize index function softmax class loss, HAPD (x' _i ) Output of the generated difficult positive samples through HAPD, C _HAPG (x ') is a classification into which the output of HAPG is classified, x' _i A difficult sample is generated for the ith.

However, in generating positive sample pairs, it is not sufficient to have only one label consistency constraint. In order to increase the diversity of the hard positive samples after adjustment, the 3 rd step further comprises: based on the trained first generation pairing anti-neural network and the trained third generation pairing anti-neural network, the labels of the adjustment difficult positive sample pairs are consistent with the labels of the original positive sample pairs, and the adjusted difficult positive sample pairs and the original negative samples are output to a second stage of THSG, wherein the trained third generation pairing anti-neural network comprises: an RCG and a discriminator RCD corresponding to the RCG; step A3, generating a sample by using a Hard triplet sample generator (HTG for short), using an original triplet image group as an input sample of the RCG, performing forward derivation by using a neural network of the RCG to obtain an RCG generated sample, using the input sample of the RCG and the RCG generated sample as input data of the RCD, and identifying whether the source of the input data of the RCD is the RCG generated sample or the input sample of the RCG through the binary training of the RCD neural network; and step B3, if the input data for identifying the RCD is the RCG generation sample, returning third gradient information which is generated by the RCD and related to the RCG generation sample to the RCG, adjusting the neural network of the RCG by using the third gradient information, updating the neural network of the RCG by using the adjusted neural network of the RCG, returning to the step A3, and continuing to execute until the two-class training of the RCD neural network is completed, identifying the source of the input data of the RCD as the input sample of the RCG, and taking the current RCG generation sample as an adjusted difficult positive sample pair.

In the above embodiment, regardless of the input x of the HAPG ^* Where in its class space, HThe APG can spoof the HAPD by generating random embedding from the class space. In this case, the embedded features generated by the HAPG are independent of their input and are not necessarily difficult samples. Thus, embodiments of the present invention introduce a reconstruction condition generator RCG to lose x by reconstruction ^r And x maps x' back to x ^r Where x may refer to a and p, x in FIG. 4 ^r May refer to a in fig. 4 ^r And p ^r . This solves the problem of pattern collapse that can occur during HAPG generation. Also, a reconstruction condition discriminator RCD is provided for the RCG as shown in equation 9, and the loss function of the RCG is shown in equation 10.

Specifically, the RCD is obtained by the following function:

wherein the content of the first and second substances,

in order to be a function of the RCD loss,

determining for the RCD that the RCD input data is a true sample, HAPD (x' _i ) The input data to the RCD is determined to be a false sample for the RCD.

Determining the RCG by adopting the following loss formula of the RCG, wherein the loss formula of the RCG is as follows:

wherein the content of the first and second substances,

for the value of the RCG loss function,

is the L2 distance between the pre-reconstruction sample and the post-reconstruction sample, and eta is the normalized exponential function and the reconstruction lossThe balance factor between the losses is a function of,

for reconstruction of conditional discriminator losses, cls is class characterization, C _RCG In order to reconstruct the loss of the condition generator,

for the particular vector after the i-th reconstruction,

In order to closely correlate the HAPG generated samples with the original samples, embodiments of the present invention allow point-to-point reconstruction of the generated samples with the original samples as completely as possible. Meanwhile, in order to train the RCG better, the embodiment of the present invention adds the same softmax loss function as the HAPG to ensure the consistency of the tags. The softMax function of the RCG is therefore also composed of two parts.

After the HAPG is adjusted, the difficult positive sample pair of the embodiments of the present invention satisfies the requirement of label consistency. At the same time, the RCG also ensures that the generated pairs do not cause pattern collapse due to random generation. Next, embodiments of the present invention will use the difficult positive sample pairs to generate the difficult negative samples and compose the final difficult sample.

Step 4, in the second stage of the THSG, on the basis of the trained second generation antagonistic neural network, increasing the difficulty degree of the original negative sample to obtain a final difficult negative sample, and outputting a final difficult positive sample pair; wherein training the second generative antagonistic neural network comprises: a difficult ternary sample generator HTG and a Discriminator corresponding to the HTG, i.e., a difficult ternary sample Discriminator (HTD); step A2, using the adjusted pair of the difficult positive samples and the original negative sample as input samples of the HTG, obtaining a first distance between the difficult candidate sample and the original negative sample and a second distance between the difficult candidate sample and the difficult positive sample by adopting a neural network of the HTG, obtaining an HTG generation sample by forward derivation of the neural network of the HTG, using the input sample of the HTG and the HTG generation sample as input data of the HTD, and identifying whether the source of the input data of the HTD is the HTG generation sample or the input sample of the HTG through C +1 classification training of the HTD neural network; and step B2, if the source of the input data of the HTD is identified to be an HTG generation sample, returning second gradient information which is generated by the HTD and related to the HTG generation sample to the HTG, adjusting the neural network of the HTG by using the second gradient information and under the condition that the first distance is smaller than the second distance, updating the neural network of the HTG by using the adjusted neural network of the HTG, returning to the step A2 to continue to execute until the C +1 classification training of the HTD neural network is carried out, identifying the source of the input data of the HTD to be the input sample of the HTG, and taking the current HTG generation sample as a final difficult negative sample.

And for the final difficult positive sample pair, the adjusted difficult positive sample pair is mapped through the second phase of the THSG and is output in the second phase of the THSG.

In order to prevent the difficult positive sample pairs from being affected by the reverse triplet state loss, the embodiment of the invention enables the generated positive sample pairs to be reconstructed through the reconstruction loss. Meanwhile, in order to train the HTG better, the embodiment of the present invention adds the same softmax loss function as the HAPG to ensure the consistency of the tag. In a possible implementation manner, the step 4 is based on a trained second generation confrontation neural network, and a self-Adaptive inversion Triplet Loss formula is adopted to increase the difficulty degree of the original negative sample to obtain a final difficult negative sample, wherein the self-Adaptive inversion Triplet Loss (ART-Loss for short) formula is as follows:

wherein the content of the first and second substances,

in order to adapt to the inversion loss in an adaptive manner,

in order to reconstruct the losses,

in order to classify the loss for the HTD,

in order to classify the loss for the HTG,

a candidate sample generated for the HTG,

a positive sample generated by the HTG is,

negative samples generated for HTG, a being the original candidate sample,

for reconstruction loss, p is the original positive sample,

in order to classify the loss in question,

for HTG generationSample general name l _i As class label, C _HTG In order to be a function of the HTG loss,

in order to reverse the triplet loss,

in order to reverse the triplet loss,

wherein, a' is the input of the positive sample,

n is the input of the negative sample,

p' is the input of the positive sample,

is L2 distance, [.] ₊ To cut off from 0, τ _r In order to reverse the triplet state loss hyperparameter,

is a loss function of the HTG; thus, when training the network, the HTG performs better and better, and the embodiments of the present invention impose a tighter limit on the reverse triplet state loss, and thus τ is reduced _r Is arranged to follow

A parameter that is changed.

As the training of the HTG is better,

will become smaller, and tau _r The adaptivity is increased and the difficulty of difficult sampling is increased. To ensure tag consistency, a discriminator HTD is provided for the HTG. Unlike HAPD, the input to HTD comes from a different class, and thus, the HTD is a C +1 class discriminator. The HTD is obtained using the following equation:

the value of the HTD loss function, C the original class number,

to be composed of

To input the result obtained by said HTD, HTD (x) _i ) For the result obtained by the HTD with the original sample x as input,

respectively shown in FIG. 4

And

such adaptive reverse triplet depletion uses input samples to generate difficult negative samples, and during HTG generation of negative samples, does not destroy positive sample pairs and ensures label consistency.

And 5, synthesizing the final difficult positive sample pair and the final difficult negative sample to obtain a final difficult ternary sample group.

And 6, taking the final difficult ternary sample group as a sample set, training a convolutional neural network, and obtaining the retrieval model.

Based on the above, the embodiment of the present invention completes the confrontation depth metric learning: the overall structure of the retrieval method based on the generation of the segmentation difficult samples in the embodiment of the invention mainly comprises three parts: a measurement network for obtaining the embedding; a difficulty perception generator THSG for performing the enhancement of the difficulty level through two stages; and corresponding confrontational metric learning. All generators reuse H _e And a layer mapping the generated samples to the same feature space as the original samples. Therefore, after the final difficult ternary sample group is generated, the embodiment of the invention trains the CNN feature extraction network by using the corresponding index. Therefore, compared with the conventional deep metric learning method, the method of the embodiment of the present invention can train the CNN network better through difficult samples, as shown in equation 14.

Wherein X is the original sample, and

are each difficult sample in the final difficult ternary sample set generated.

The embodiment of the invention applies the framework of the THSG to a depth measurement learning framework to improve the performance. For the feature extraction network, the final objective loss function is shown in equation 15.

representing the overall loss function, denoted F, final abbreviation, is the same predefined parameter as equation 12,

a function representing the loss of the generator is expressed,

representing a metric function based on the original samples, phi a parameter balance coefficient,

representing a metric function based on difficult samples, ori representing origin abbreviations, i.e. original samples, h representing difficult samples, phi representing

Showing the softmax loss function, soft stands for softmax abbreviation,

represents the metric loss function, t represents the metric, X represents the original sample, the difficult positive sample X' is the output of the first stage, and

a difficult negative example generated for the second stage. Therefore, the temperature of the molten metal is controlled,

is based on a metric function of the original sample, and

is a metric function based on the generated difficult samples.

Is composed of

And

a balance parameter therebetween.

In addition, the embodiment of the invention can train the generation network and the feature extraction network at the same time, and balance parametersNumber of

The problem that the training performance of the generated network in the initial training stage is poor is solved. Moreover, after the network training of the embodiment of the invention is finished, the final feature extraction network does not need any extra calculation work.

Unlike prior art metric learning methods, the prior art only mines difficult samples or generates difficult negative samples. According to the THSG provided by the embodiment of the invention, the THSG can generate a final difficult sample through a two-stage network; the THSG derives the final difficult samples by two valid independent generators for generating adjusted difficult positive sample pairs and a generator for generating final negative samples, and the THSG aims to fully exploit the potential of the positive sample negative samples. Experimental results on CUB-200-2011, cars196, and Stanford datasets indicate that THSG effectively improves the performance of existing difficult sample generation methods.

The following provides a description of a retrieval apparatus based on generation of a segmentation-difficult sample according to an embodiment of the present invention.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a retrieval apparatus based on segmentation difficulty sample generation according to an embodiment of the present invention. The retrieval device based on the generation of the segmentation difficult sample provided by the embodiment of the invention can comprise the following modules:

the extraction module 21 is used for extracting the characteristics of the image to be retrieved;

the processing module 22 is configured to use the features of the image to be retrieved as an input of a retrieval model, and obtain, through the retrieval model, a retrieval result related to the image to be retrieved and a distance score between the features of the image to be retrieved and the features of the images in the retrieval model database; the retrieval model is obtained by training based on an original ternary image group serving as a sample set and a final difficult ternary sample group obtained by a two-stage difficult sample generation frame THSG; in the first stage of the THSG, increasing difficulty degree to original positive sample pairs in an original ternary image group to obtain difficult positive sample pairs; adjusting the label of the difficult positive sample pair to be consistent with the label of the original positive sample pair, and outputting the adjusted difficult positive sample pair and the original negative sample in the original ternary image group to a second stage of the THSG; in the second stage of the THSG, increasing the difficulty degree of the original negative sample to obtain a final difficult negative sample and a final difficult positive sample pair; synthesizing the final difficult positive sample pair and the final difficult negative sample to obtain a final difficult ternary sample group;

and the sorting module 23 is configured to sort the retrieval results related to the image to be retrieved according to the distance scores to obtain the retrieval result most related to the image to be retrieved.

In a possible implementation manner, the extraction module is configured to:

extracting the characteristics of the animal image to be retrieved;

the processing module is configured to:

the sorting module is configured to:

In one possible implementation, the apparatus further includes: a generating module, configured to obtain the search model through the following steps:

acquiring an original ternary image group serving as a sample set;

and taking the final difficult ternary sample group as a sample set, and training a convolutional neural network to obtain the retrieval model.

In one possible implementation manner, the generating module is configured to:

and stretching the original positive sample pair by adopting a piecewise linear operation formula in a piecewise linear PLM stretching mode, increasing the difficulty degree and obtaining a difficult positive sample pair, wherein the piecewise linear operation formula comprises the following steps:

a ^* ＝a+λ(a-p)

p ^* ＝p+λ(p-a)

wherein d is _epoch-1 The average distance of the positive sample pairs calculated in the last training process, wherein the positive sample pairs comprise: the last time is the original positive sample at the time of the first training process, and the last time is the difficult positive sample pair at the time of the non-first training process.

In one possible implementation, the RCD and the HAPD are determined using the following equations, as opposed to the HAPD input and output:

generating a signature of the sample after passing through the discriminator, R (x' _i ) For the output of the input data after passing through the discriminator,

is the ith generation sample, x' _i Is the ithThe samples are input to the computer system and are processed,

to normalize the softmax class loss,

as a function of the loss of the discriminator.

In one possible implementation, the HAPG is obtained by the following function:

wherein the content of the first and second substances,

as a function of the loss of the HAPG,

is a loss in the HAPD class,

for, HAPG class loss, cls is class,

HAPD (x 'for normalized exponential function class loss' _i ) Output of the generated difficult positive samples through HAPD, C _HAPG (x′ _i ) Is a category, x ', into which the output of the HAPG is classified' _i A difficult sample is generated for the ith.

In one possible implementation, the RCG is determined using the following RCG loss equation, where the RCG loss equation is as follows:

for the value of the RCG loss function,

is the L2 distance between the pre-reconstruction sample and the post-reconstruction sample, eta is the balance factor between the normalized exponential function and the reconstruction loss,

reconstruction Condition Generator loss concrete form, x, for RCG ^r The reconstructed vector of the RCG passed through for the generated samples of HAPG, x is the original vector,

is the class loss of the normalized exponential function, sm is the abbreviation of the normalized exponential function,

for the particular vector after the i-th reconstruction,

output of the discriminator for reconstructing the backward quantity,/ _i Is the ith category label, and i is the serial number.

In one possible implementation manner, the generating module is configured to: :

wherein the content of the first and second substances,

in order to adapt to the inversion loss in an adaptive manner,

in order to reconstruct the loss of the optical fiber,

in order to classify the loss for the HTD,

in order to classify the loss for the HTG,

a candidate sample generated for the HTG is,

a positive sample generated for the HTG,

negative examples generated for the HTG, a is the original candidate example,

for reconstruction loss, p is the original positive sample,

in order to classify the loss in question,

generation of a sample for HTG, collectively,/ _i As class label, C _HTG In order to be a function of the HTG loss,

in order to reverse the triplet loss,

in order to reverse the triplet loss,

wherein, a' is the input of the positive sample,

n is the input of the negative sample,

p' is the input of the positive sample,

is L2 distance, [.] ₊ To cut off from 0,. Tau _r In order to reverse the triplet state loss hyperparameter,

v is a constant and the value range of v is from 0 to positive infinity, beta is a constant and the value range of beta is from 0 to positive infinity,

is a loss function of the HTG;

obtaining the HTD by adopting the following formula:

wherein the content of the first and second substances,

the value of the HTD loss function, C the original class number,

to be composed of

To input the result obtained by said HTD, HTD (x) _i ) Is the result of the HTD with the original sample x as input.

The following continues to describe the electronic device provided by the embodiment of the present invention.

Referring to fig. 6, fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. The embodiment of the present invention further provides an electronic device, which includes a processor 31, a communication interface 32, a memory 33, and a communication bus 34, wherein the processor 31, the communication interface 32, and the memory 33 complete mutual communication through the communication bus 34,

a memory 33 for storing a computer program;

the processor 31 is configured to implement the steps of the above-mentioned search method based on the difficult-to-segment sample generation when executing the program stored in the memory 33, and in one possible implementation manner of the present invention, the following steps may be implemented:

extracting the characteristics of an image to be retrieved;

taking the characteristics of the image to be retrieved as the input of a retrieval model, and obtaining a retrieval result related to the image to be retrieved and a distance score between the characteristics of the image to be retrieved and the characteristics of the image in a retrieval model database through the retrieval model; the retrieval model is obtained by training based on an original ternary image group serving as a sample set and a final difficult ternary sample group obtained by a two-stage difficult sample generation framework THSG; in the first stage of the THSG, increasing difficulty degree to original positive sample pairs in an original ternary image group to obtain difficult positive sample pairs; adjusting the labels of the difficult positive sample pairs to be consistent with the labels of the original positive sample pairs, and outputting the adjusted difficult positive sample pairs and the original negative samples in the original ternary image group to a second stage of the THSG; in the second stage of the THSG, increasing the difficulty degree of the original negative sample to obtain a final difficult negative sample and a final difficult positive sample pair; synthesizing the final difficult positive sample pair and the final difficult negative sample to obtain a final difficult ternary sample group;

The communication bus mentioned in the electronic device may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this is not intended to represent only one bus or type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a RAM (Random Access Memory) or an NVM (Non-Volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also a DSP (Digital Signal Processing), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.

The method provided by the embodiment of the invention can be applied to electronic equipment. Specifically, the electronic device may be: desktop computers, laptop computers, intelligent mobile terminals, servers, and the like. Without limitation, any electronic device that can implement the embodiments of the present invention is within the scope of the present invention.

An embodiment of the present invention provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the above-mentioned retrieval method based on the generation of the segmentation-difficult samples.

Embodiments of the present invention provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the steps of the above-described retrieval method based on segmentation-difficult-sample generation.

Embodiments of the present invention provide a computer program which, when run on a computer, causes the computer to perform the steps of the above-described retrieval method based on segmentation-difficult-sample generation.

It should be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus/electronic device/storage medium/computer program product/computer program embodiment comprising instructions, the description is relatively simple as it is substantially similar to the method embodiment, and reference may be made to some descriptions of the method embodiment for relevant points.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement and the like made within the spirit and original scope of the present invention are included in the protection scope of the present invention.

Claims

1. A retrieval method based on segmentation difficult sample generation is characterized by comprising the following steps:

extracting the characteristics of an image to be retrieved;

sorting the retrieval results related to the images to be retrieved according to the distance scores to obtain the retrieval result most related to the images to be retrieved;

obtaining the retrieval model through the following steps:

acquiring an original ternary image group serving as a sample set;

a second stage of adjusting the labels of the difficult positive sample pairs to be consistent with the labels of the original positive sample pairs based on the trained first generation antithetical neural network, and outputting the adjusted difficult positive sample pairs and the original negative samples to the THSG; wherein the trained first generated antagonistic neural network comprises: a difficult positive sample pair generator HAPG and a discriminator HAPD corresponding to the HAPG;

2. The method as claimed in claim 1, wherein said extracting features of the image to be retrieved comprises:

extracting the characteristics of the animal image to be retrieved;

the method for obtaining the retrieval result related to the image to be retrieved and the distance score between the feature of the image to be retrieved and the feature of the image in the retrieval model database by taking the feature of the image to be retrieved as the input of the retrieval model comprises the following steps:

the sorting the retrieval results related to the images to be retrieved according to the distance scores to obtain the most related retrieval result of the images to be retrieved comprises the following steps:

3. The method of claim 1, wherein the second stage of adjusting the labels of the pairs of difficult positive samples to be consistent with the labels of the pairs of original positive samples, outputting the pairs of adjusted difficult positive samples, and the original negative samples to the THSG based on the trained first generated antagonistic neural network comprises:

4. The method of claim 1, wherein the first stage of the two-stage difficult sample generating framework THSG, stretching the original positive sample pair by piecewise linear stretching PLM to increase the difficulty level, and obtaining a difficult positive sample pair, comprises:

a*＝a+λ(a-p)

p*＝p+λ(p-a)

wherein, a ^* Is a difficult candidate sample, a is the original candidate sample, and λ isCoefficient of stretch distance, p is the original positive sample, p ^* For difficult positive samples, α is the bias hyperparameter, d ₀ Is a segmentation coefficient, d (a, p) is the distance between the original candidate sample a and the original positive sample p, and gamma is a linear hyperparameter;

or, stretching the original positive sample pair by adopting an optimal piecewise linear operation formula in a piecewise linear stretching PLM mode, and increasing the difficulty degree to obtain a difficult positive sample pair, wherein the optimal piecewise linear operation formula comprises:

wherein, d _epoch-1 The average distance of the positive sample pairs calculated in the last training process, wherein the positive sample pairs comprise: the last time is the original positive sample during the first training process, and the last time is the difficult positive sample pair during the non-first training process.

5. The method of claim 3 wherein the RCD is different from the HAPD input and output, and the RCD and HAPD are determined using the following equations:

wherein the content of the first and second substances,

to generate features of the sample after passing through the discriminator, R (x) _i ') is the output of the input data after passing through the discriminator,

for the ith generated sample, x _i ' is the ith input sample,

to be normalizedThe softmax class of losses are taken into account,

as a function of the loss of the discriminator.

6. A method according to claim 3, wherein the HAPG is obtained by the function:

wherein the content of the first and second substances,

as a function of the loss of the HAPG,

in order to be a loss of the HAPD class,

for HAPG class loss, cls is a class,

to normalize the exponential function class loss, HAPD (x) _i ') output of the generated difficult positive samples through HAPD, C _HAPG (x _i ') is a classification of the output of HAPG, x _i ' A difficult sample is generated for the ith.

7. The method of claim 3, wherein the RCG is determined using the RCG loss equation as follows:

wherein the content of the first and second substances,

is the value of the RCG loss function,

loss specific form of reconstruction condition generator for RCG, x ^r The reconstructed vector of the RCG passed through for the generated samples of HAPG, x being the original vector,

for the particular vector after the i-th reconstruction,

8. The method of claim 1, wherein the adding a difficulty level to the original negative examples based on the trained second generative antagonistic neural network to obtain final difficulty negative examples comprises:

based on a trained second generation antagonistic neural network, adopting a self-adaptive inversion triplet state loss formula to increase the difficulty degree of the original negative sample to obtain a final difficult negative sample, wherein the self-adaptive inversion triplet state loss formula is as follows:

in order to adapt to the inversion loss in an adaptive manner,

in order to reconstruct the losses,

in order to classify the loss for the HTD,

in order to classify the loss for the HTG,

a candidate sample generated for the HTG is,

a positive sample generated for the HTG,

negative samples generated for HTG, a being the original candidate sample,

for reconstruction loss, p is the original positive sample,

in order to classify the loss in question,

in order to reverse the triplet loss,

in order to reverse the triplet loss,

wherein, a' is the input of the positive sample,

n is the input of the negative sample,

p' is the input of the positive sample,

is a loss function of the HTG;

obtaining the HTD by adopting the following formula:

wherein the content of the first and second substances,

the value of the HTD penalty function, C the original class number,

to be composed of

9. A retrieval apparatus based on segmentation-difficult sample generation, comprising:

the sorting module is used for sorting the retrieval results related to the images to be retrieved according to the distance scores to obtain the retrieval result most related to the images to be retrieved;

a generating module, configured to obtain the search model through the following steps:

acquiring an original ternary image group serving as a sample set;