CN111858999B - Retrieval method and device based on segmentation difficult sample generation - Google Patents

Retrieval method and device based on segmentation difficult sample generation Download PDF

Info

Publication number
CN111858999B
CN111858999B CN202010586972.9A CN202010586972A CN111858999B CN 111858999 B CN111858999 B CN 111858999B CN 202010586972 A CN202010586972 A CN 202010586972A CN 111858999 B CN111858999 B CN 111858999B
Authority
CN
China
Prior art keywords
sample
difficult
original
positive
positive sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010586972.9A
Other languages
Chinese (zh)
Other versions
CN111858999A (en
Inventor
祝闯
董慧慧
齐勇刚
刘军
刘芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202010586972.9A priority Critical patent/CN111858999B/en
Publication of CN111858999A publication Critical patent/CN111858999A/en
Application granted granted Critical
Publication of CN111858999B publication Critical patent/CN111858999B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/538Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Library & Information Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a retrieval method and a retrieval device based on difficult segmented sample generation, wherein the method comprises the following steps: using all samples in the sample set of the original ternary image group, and increasing the difficulty degree of each group of original ternary image group in the sample set of the original ternary image group; in the first stage of the THSG, the difficulty degree of the positive sample pair is increased, the difficulty positive sample pair is obtained, meanwhile, the consistency of the labels of the difficulty positive sample pair and the labels of the original positive sample pair is ensured, in the second stage, the difficulty degree of the original negative sample is increased, the final difficulty negative sample and the final difficulty positive sample pair are obtained, and the effective usability of the sample set is improved. Further, using the final difficult ternary sample group, effective difficult samples can be supplemented for fewer training sets, thereby enabling the model to be trained better. Meanwhile, a more robust and robust feature extraction retrieval model is obtained by using the difficult sample pair training.

Description

Retrieval method and device based on segmentation difficult sample generation
Technical Field
The invention relates to the technical field of image processing, in particular to a retrieval method and a retrieval device based on difficult segmented sample generation.
Background
The Deep Metric Learning (DML) method aims to learn a powerful Metric criterion to accurately and robustly measure the similarity between data. At present, the development of DML enables its wide application in various fields, such as image retrieval, human re-identification, clustering, and other multimedia task fields.
The above image search is taken as an example for explanation. At present, there are various image retrieval methods based on DML, and there is mainly a model construction method based on metric learning, and in metric learning, multiple sets of ternary image group samples are used as input of a construction model, where each set of ternary image group samples is composed of a pair of positive samples with the same label and negative samples with different labels from the positive samples. However, in some small-scale datasets, the number of triplet group samples that can be constructed is limited. For example, in the process of retrieving images of wild animals, because the image data volume of some wild rare animals is small, the number of ternary image group samples of the wild rare animals constructed by the method is too small, so that the model cannot be effectively trained, and the effectiveness of retrieving images of animals is further reduced.
In a word, in some small-scale data sets, the number of samples of the ternary image group which can be constructed is limited, so that the model cannot be effectively trained, and the retrieval effectiveness is reduced.
Disclosure of Invention
The embodiment of the invention aims to provide a retrieval method and a retrieval device based on difficult segmented sample generation, which are used for solving the technical problems that in the prior art, in some small-scale data sets, the number of samples of a ternary image group which can be constructed is limited, so that a model cannot be trained effectively, and the retrieval effectiveness is reduced. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a retrieval method based on generation of a segmentation-difficult sample, including:
extracting the characteristics of the image to be retrieved;
taking the characteristics of the image to be retrieved as the input of a retrieval model, and obtaining a retrieval result related to the image to be retrieved and a distance score between the characteristics of the image to be retrieved and the characteristics of the image in a retrieval model database through the retrieval model; the retrieval model is obtained by training based on an original ternary image group serving as a sample set and a final difficult ternary sample group obtained by a two-stage difficult sample generation framework THSG; in the first stage of the THSG, increasing difficulty degree to original positive sample pairs in an original ternary image group to obtain difficult positive sample pairs; adjusting the label of the difficult positive sample pair to be consistent with the label of the original positive sample pair, and outputting the adjusted difficult positive sample pair and the original negative sample in the original ternary image group to a second stage of the THSG; in the second stage of the THSG, increasing the difficulty degree of the original negative sample to obtain a final difficult negative sample and a final difficult positive sample pair; synthesizing the final difficult positive sample pair and the final difficult negative sample to obtain a final difficult ternary sample group;
and sequencing the retrieval results related to the images to be retrieved according to the distance scores to obtain the retrieval result most related to the images to be retrieved.
Further, the extracting the features of the image to be retrieved includes:
extracting the characteristics of the animal image to be retrieved;
the step of taking the features of the image to be retrieved as the input of a retrieval model, obtaining a retrieval result related to the image to be retrieved and a distance score between the features of the image to be retrieved and the features of the image in the retrieval model database through the retrieval model, comprises the following steps:
taking the characteristics of the animal image to be retrieved as the input of a retrieval model, and obtaining a retrieval result related to the animal image to be retrieved and a distance score between the characteristics of the animal image to be retrieved and the characteristics of the images in the retrieval model database through the retrieval model;
the step of sorting the retrieval results related to the images to be retrieved according to the distance scores to obtain the most relevant retrieval result of the images to be retrieved comprises the following steps:
and sequencing the retrieval results related to the images to be retrieved according to the distance scores to obtain the animal retrieval result most related to the animal images to be retrieved.
Further, the retrieval model is obtained through the following steps:
acquiring an original ternary image group serving as a sample set;
in the first stage of the two-stage difficult sample generation frame THSG, stretching the original positive sample pair in a piecewise linear stretching PLM mode to increase the difficulty degree and obtain a difficult positive sample pair; wherein the pair of difficult positive samples comprises: difficult candidate samples and difficult positive samples;
a second stage of adjusting the labels of the difficult positive sample pairs to be consistent with the labels of the original positive sample pairs based on the trained first generation pairing-resistance neural network, outputting the adjusted difficult positive sample pairs and the original negative samples to the THSG; wherein the trained first generated antagonistic neural network comprises: a difficult positive sample pair generator HAPG and a discriminator HAPD corresponding to the HAPG;
in the second stage of the THSG, based on a trained second generation antagonistic neural network, increasing the difficulty degree of the original negative sample to obtain a final difficult negative sample, and outputting a final difficult positive sample pair; wherein the trained second generative antagonistic neural network comprises: a difficult ternary sample generator HTG and a discriminator HTD corresponding to the HTG;
synthesizing the final difficult positive sample pair and the final difficult negative sample to obtain a final difficult ternary sample group;
and (5) taking the final difficult ternary sample group as a sample set, training a convolutional neural network, and obtaining the retrieval model.
Further, the second stage of adjusting the labels of the pair of difficult positive samples to be consistent with the labels of the pair of original positive samples and outputting the pair of adjusted difficult positive samples and the pair of original negative samples to the THSG based on the trained first generation of the antagonistic neural network includes:
adjusting labels of the pairs of difficult positive samples to be consistent with labels of the pairs of original positive samples based on a trained first generation paired anti-neural network and a trained third generation paired anti-neural network, and outputting the adjusted pairs of difficult positive samples and the original negative samples to a second stage of the THSG, wherein the trained third generation paired anti-neural network comprises: a reconstruction condition generator RCG and a discriminator RCD corresponding to the RCG.
Further, in the first stage of the two-stage difficult sample generation framework THSG, the original positive sample pair is stretched in a piecewise linear stretching PLM manner, and the difficulty level is increased, so as to obtain a difficult positive sample pair, including:
and stretching the original positive sample pair by adopting a piecewise linear operation formula in a piecewise linear stretching PLM mode, increasing the difficulty degree and obtaining a difficult positive sample pair, wherein the piecewise linear operation formula comprises the following steps:
a * =a+λ(a-p)
p * =p+λ(p-a)
Figure BDA0002554167010000041
wherein, a * For difficult candidate samples, a is the original candidate sample, λ is the stretch distance coefficient, p is the original positive sample, p * For difficult positive samples, α is the bias hyperparameter, d 0 Is a segmentation coefficient, d (a, p) is the distance between the original candidate sample a and the original positive sample p, and gamma is a linear hyperparameter;
or, stretching the original positive sample pair by adopting a preferred piecewise linear operation formula in a piecewise linear stretching PLM mode, increasing the difficulty level, and obtaining a difficult positive sample pair, wherein the preferred piecewise linear operation formula comprises:
Figure BDA0002554167010000042
wherein d is epoch-1 The average distance of the positive sample pairs calculated in the last training process is calculated, wherein the positive sample pairs comprise: the last time is the original positive sample during the first training process, and the last time is the difficult positive sample pair during the non-first training process.
Further, the RCD and the HAPD are determined using the following equations, different from the HAPD input and output:
Figure BDA0002554167010000043
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002554167010000044
to generate a signature of the sample after passing through the discriminator, R (x' i ) For the output of the input data after passing through the discriminator,
Figure BDA0002554167010000045
is the ith generation sample, x' i For the ith input sample, the sample is,
Figure BDA0002554167010000046
to normalize the softmax class loss,
Figure BDA0002554167010000047
as a function of the loss of the discriminator.
Further, the HAPG is obtained by the following function:
Figure BDA0002554167010000048
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002554167010000049
as a function of the loss of the HAPG,
Figure BDA00025541670100000410
as a HAPD class impairmentIn the light of the above-mentioned problems,
Figure BDA00025541670100000411
for HAPG class loss, cls is a class,
Figure BDA00025541670100000412
to normalize exponential function class loss, HAPD (x' i ) Output of the generated difficult positive samples through HAPD, C HAPG (x′ i ) Is a category, x ', into which the output of the HAPG is classified' i A difficult sample is generated for the ith.
Further, the RCG is determined by using the following RCG loss formula, wherein the RCG loss formula is as follows:
Figure BDA0002554167010000051
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002554167010000052
for the value of the RCG loss function,
Figure BDA0002554167010000053
is the L2 distance between the pre-reconstruction sample and the post-reconstruction sample, and eta is a balance factor between the normalized exponential function and the reconstruction loss,
Figure BDA0002554167010000054
for reconstruction of condition discriminator losses, cls is class characterization, C RCG In order to reconstruct the condition generator losses,
Figure BDA0002554167010000055
reconstruction Condition Generator loss concrete form, x, for RCG r The reconstructed vector of the RCG passed through for the generated samples of HAPG, x being the original vector,
Figure BDA0002554167010000056
to normalize the class loss of the exponential function, sm is short for normalizing the exponential function,
Figure BDA0002554167010000057
for the ith reconstructed specific vector,
Figure BDA0002554167010000058
output of the discriminator for reconstructing the backward quantity,/ i Is the ith category label, i is the serial number.
Further, the generating an antagonistic neural network based on the trained second generation, adding a difficulty degree to the original negative sample, and obtaining a final difficult negative sample, including:
based on the trained second generation countermeasure neural network, adopting a self-adaptive inversion triplet state loss formula to increase the difficulty degree of the original negative sample to obtain a final difficult negative sample, wherein the self-adaptive inversion triplet state loss formula is as follows:
Figure BDA0002554167010000059
wherein the content of the first and second substances,
Figure BDA00025541670100000510
is a loss function of the HTG, eta is a balance factor between the normalized exponential function and the reconstruction loss, mu is a reconstruction loss balance parameter,
Figure BDA00025541670100000511
in order to adapt to the inversion loss in an adaptive manner,
Figure BDA00025541670100000512
in order to reconstruct the losses,
Figure BDA00025541670100000513
in order to classify the loss for the HTD,
Figure BDA00025541670100000514
in order to classify the loss for the HTG,
Figure BDA00025541670100000515
a candidate sample generated for the HTG is,
Figure BDA00025541670100000516
for the positive samples generated by the HTG,
Figure BDA00025541670100000517
negative samples generated for HTG, a being the original candidate sample,
Figure BDA00025541670100000518
for reconstruction losses, p is the original positive sample,
Figure BDA00025541670100000519
in order to classify the loss in question,
Figure BDA00025541670100000520
generation of a sample collective term for HTG,/ i As class label, C HTG In order to be a function of the HTG loss,
Figure BDA00025541670100000521
in order to reverse the triplet loss,
Figure BDA00025541670100000522
in order to reverse the triplet losses in the light of,
Figure BDA00025541670100000523
wherein, a' is the input of the positive sample,
Figure BDA00025541670100000524
n is the input of the negative sample,
Figure BDA00025541670100000525
p' is the input of the positive sample,
Figure BDA00025541670100000526
l2 distance, [.] + To cut off from 0, τ r In order to reverse the triplet state loss hyperparameter,
Figure BDA00025541670100000527
v is a constant and the value range of v is from 0 to plus infinity, β is a constant and the value range of β is from 0 to plus infinity,
Figure BDA0002554167010000061
is a loss function of the HTG;
the HTD is obtained using the following equation:
Figure BDA0002554167010000062
wherein the content of the first and second substances,
Figure BDA0002554167010000063
the value of the HTD penalty function, C the original class number,
Figure BDA0002554167010000064
to be composed of
Figure BDA0002554167010000065
To input the results obtained by said HTD, HTD (x) i ) Is the result of the HTD with the original sample x as input.
In a second aspect, an embodiment of the present invention provides a retrieval apparatus based on difficult-to-segment sample generation, including:
the extraction module is used for extracting the characteristics of the image to be retrieved;
the processing module is used for taking the characteristics of the image to be retrieved as the input of a retrieval model, and obtaining a retrieval result related to the image to be retrieved and a distance score between the characteristics of the image to be retrieved and the characteristics of the image in the retrieval model database through the retrieval model; the retrieval model is obtained by training based on an original ternary image group serving as a sample set and a final difficult ternary sample group obtained by a two-stage difficult sample generation frame THSG; the final difficult ternary sample group is obtained by increasing the difficulty degree of an original positive sample pair in an original ternary sample group in the first stage of the THSG; adjusting the label of the difficult positive sample pair to be consistent with the label of the original positive sample pair, and outputting the adjusted difficult positive sample pair and the original negative sample in the original ternary image group to a second stage of the THSG; in the second stage of the THSG, increasing the difficulty degree of the original negative sample to obtain a final difficult negative sample and a final difficult positive sample pair; synthesizing the final difficult positive sample pair and the final difficult negative sample to obtain a final difficult ternary sample group;
and the sorting module is used for sorting the retrieval results related to the images to be retrieved according to the distance scores to obtain the retrieval result most related to the images to be retrieved.
In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;
a memory for storing a computer program;
a processor for implementing the steps of the method of any one of the first aspect when executing a program stored in the memory.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to perform the method of any one of the above first aspects.
The embodiment of the invention has the following beneficial effects:
compared with the prior art, the retrieval method and the retrieval device based on the difficult-to-segment sample generation have the advantages that all samples in the sample set of the original ternary image group are used, and the difficulty degree of each original ternary image group in the sample set of the original ternary image group is increased; in the first stage of the THSG, the difficulty degree of the positive sample pair is increased, the difficulty positive sample pair is obtained, meanwhile, the consistency of the labels of the difficulty positive sample pair and the labels of the original positive sample pair is ensured, in the second stage, the difficulty degree of the original negative sample is increased, the final difficulty negative sample and the final difficulty positive sample pair are obtained, and the effective usability of the sample set is improved. Further, using the final difficult ternary sample set, effective difficult samples can be supplemented for fewer training sets, thereby enabling the model to be trained better. Meanwhile, a more robust and robust feature extraction retrieval model is obtained by using the difficult sample pair training.
Of course, it is not necessary for any product or method to achieve all of the above-described advantages at the same time for practicing the invention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a first flowchart of a retrieval method based on segmentation difficulty sample generation according to an embodiment of the present invention;
fig. 2 is a second flowchart of a retrieval method based on generation of a segmentation-difficult sample according to an embodiment of the present invention;
fig. 3 is a third flow chart of the retrieval method based on the generation of the segmentation-difficult sample according to the embodiment of the present invention;
FIG. 4 is a diagram illustrating a fourth flowchart of a retrieval method based on segmentation difficulty sample generation according to an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a retrieval apparatus based on segmentation difficulty sample generation according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
First, for convenience of understanding the embodiments of the present invention, the following terms "image to be retrieved", "original triplet set", "original candidate sample", "original positive sample pair", "original negative sample", "difficult candidate sample", "difficult positive sample pair", "difficult negative sample", "final difficult triplet set", "adjusted difficult positive sample pair", "final difficult positive sample pair", "retrieval result related to image to be retrieved", and "retrieval result most related to image to be retrieved" in the embodiments of the present invention are used first.
The retrieval model database in the images to be retrieved of the images to be detected and the retrieval model database in the images in the retrieval model database are used for distinguishing the two images. The message to be retrieved is a candidate image which is not used for retrieval, and the acquisition mode of the image to be retrieved is not limited, and the image to be retrieved can be obtained by shooting or pre-stored, so that the message to be retrieved needs to be detected by using a detection model, and the content of the image to be retrieved, which needs to be retrieved, can be known. And the images in the retrieval model database are used as the generation basis of the final difficult ternary sample group of the detector, and the images in the retrieval model database have labels which are used for representing specific type labels of the images in the retrieval model database. Using the original message, a final difficult ternary sample group obtained by training based on an original ternary image group as a sample set and a Two-Stage difficult sample Generation framework (THSG) is trained, and a feature extraction model, that is, a trained Convolutional Neural Network (CNN) is obtained by training the final difficult ternary sample group, so that the feature extraction model can learn the features of the images in the search model database. Such a trained CNN may be referred to as a retrieval model. In this regard, the convolutional neural network before being untrained may be referred to as a search model to be trained.
The "original" in the original candidate sample, "original" in the original positive sample, and "original" in the original negative sample, and "original" in the "original positive sample pair," difficulty "in the" difficult candidate sample, "difficulty" in the "difficult positive sample pair," difficulty "in the" difficult negative sample, "difficulty" in the "adjusted difficulty positive sample pair," and "final difficulty" in the "final difficult positive sample pair" are to distinguish the respective samples. By pair is meant two samples of the same type of label. And the original triplet, the original candidate sample, the original positive sample pair, and the original negative sample may be collectively referred to as the original sample. The difficult candidate samples, the difficult positive sample pairs, the difficult negative samples, the final difficult ternary sample set, the adjusted difficult positive sample pairs, and the final difficult positive sample pairs may be collectively referred to as difficult samples.
And, each set of original triplet includes: an original positive exemplar pair consisting of the original candidate exemplar, the original positive exemplar co-labeled with the original candidate exemplar, and an original negative exemplar that is different from the original candidate exemplar label. Each difficult triplet set includes: a difficult positive exemplar pair consisting of a difficult candidate exemplar, a difficult positive exemplar co-labeled with the difficult candidate exemplar, and a difficult negative exemplar that is different from the difficult candidate exemplar label.
The correlation of the retrieval results related to the image to be retrieved and the most correlation of the retrieval results related to the image to be retrieved are used for distinguishing the two retrieval results, and the retrieval result related to the image to be retrieved is determined from the retrieval results related to the image to be detected.
In order to grasp the idea of the embodiment of the present invention as a whole, the following briefly introduces the following overall implementation process: as shown in fig. 1, the path shown by the thin black line is used to train the CNN to obtain a trained CNN, for example, a sample set is obtained, based on the sample set, a final difficult ternary sample group obtained by training the frame THSG is generated by using two stages of difficult samples, and the CNN is trained in an auxiliary manner to obtain the trained CNN. And taking the trained CNN as a detection model, acquiring an image to be detected on the other path as shown by a thick black line in the figure 1, extracting the characteristics of the image to be detected, and finally outputting a retrieval result most relevant to the image to be detected through the detection model. The details will be described below.
Aiming at the problem that in some small-scale data sets in the prior art, the number of samples of the ternary image group which can be constructed is limited, so that a model cannot be trained effectively, and the retrieval effectiveness is reduced, the embodiment of the invention provides a retrieval method and a retrieval device based on segmentation difficulty sample generation, wherein all samples in the sample set of the original ternary image group are used, and the difficulty degree is increased for each group of original ternary image group in the sample set of the original ternary image group; in the first stage of the THSG, the difficulty degree of the positive sample pair is increased, the difficulty positive sample pair is obtained, meanwhile, the consistency of the label of the difficulty positive sample pair and the label of the original positive sample pair is ensured, in the second stage, the difficulty degree of the original negative sample is increased, the difficulty negative sample is obtained, and the effective usability of the sample set is improved. Furthermore, the generated difficult ternary sample group is used as a sample set, so that effective difficult samples can be supplemented for fewer training sets, and the model can be trained better. Meanwhile, a more robust and robust feature extraction retrieval model is obtained by using the difficult sample pair training.
First, a search method based on difficult segmented sample generation provided by the embodiment of the present invention is described below.
The retrieval method based on the segmentation difficulty sample generation provided by the embodiment of the invention is applied to scenes such as personnel images or animal images. Furthermore, the DML can be applied to multimedia task scenes such as visual product retrieval, zero-lens image retrieval, highlight image detection, face image retrieval and the like. This may be done for DML purposes, keeping similar examples in close position and keeping dissimilar examples away from each other.
As shown in fig. 2, a retrieval method based on generation of a segmentation-difficult sample according to an embodiment of the present invention may include the following steps:
and 110, extracting the characteristics of the image to be retrieved. Thus, the characteristics which are not identified by the retrieval model can be extracted from the image to be retrieved.
Step 120, using the features of the image to be retrieved as the input of a retrieval model, and obtaining a retrieval result related to the image to be retrieved and a distance score between the features of the image to be retrieved and the features of the image in the retrieval model database through the retrieval model; the retrieval model is obtained by training based on an original ternary image group serving as a sample set and a final difficult ternary sample group obtained through THSG; the final difficult ternary sample group is obtained by increasing the difficulty degree of an original positive sample pair in an original ternary sample group in the first stage of the THSG; adjusting the label of the difficult positive sample pair to be consistent with the label of the original positive sample pair, and outputting the adjusted difficult positive sample pair and the original negative sample in the original ternary image group to a second stage of the THSG; in the second stage of the THSG, increasing the difficulty degree of the original negative sample to obtain a final difficult negative sample and a final difficult positive sample pair; and synthesizing the final difficult positive sample pair and the final difficult negative sample to obtain a final difficult ternary sample set. This is based on the constraint of the last equation, equation 13 below, that is, letting negative examples be closer to the candidate, the better, and thus the harder the better.
And step 130, sorting the retrieval results related to the images to be retrieved according to the distance scores to obtain the retrieval result most related to the images to be retrieved.
In order to facilitate better sorting and obtain the retrieval result most relevant to the image to be retrieved, the retrieval results relevant to the image to be retrieved can be sorted according to the sequence of the distance scores from high to low; taking the N bits in the front of the sequence as the most relevant retrieval result of the image to be retrieved; or sorting the retrieval results related to the images to be retrieved according to the sequence of the distance scores from low to high; taking the N bits in the later sequence as the most relevant retrieval result of the image to be retrieved; specifically, the N bits may be determined according to user requirements, and generally, N may refer to 10, which is not limited herein.
Based on the above description, in the study of the wild animals, the pictures of the similar animals are retrieved according to the taken pictures to perform the study on the aspects of the life habit, the action track, the regional distribution and the like of the animals, and especially in the study of the wild rare animals, for example, the protection of the wild rare animals such as wild pandas, and even the protection of the wild endangered animals such as antelopes, the wild animals need to be identified or retrieved, so that the efficiency of the study is effectively improved through deep learning. Therefore, better picture feature expression can be obtained when tasks such as image retrieval and the like are executed in a wild animal picture library, and higher retrieval performance is realized. The embodiment of the invention is described by taking the application to retrieval of wild animal images as an example, for example, extracting the characteristics of the animal images to be retrieved; taking the characteristics of the animal image to be retrieved as the input of a retrieval model, and obtaining a retrieval result related to the animal image to be retrieved and a distance score between the characteristics of the animal image to be retrieved and the characteristics of the images in the retrieval model database through the retrieval model; and sequencing the retrieval results related to the images to be retrieved according to the distance scores to obtain the animal retrieval result most related to the animal images to be retrieved. The search model is obtained by training an original ternary sample group serving as a sample set, wherein the original ternary sample group serving as the sample set refers to an original ternary sample group serving as a sample set from a wild animal image, the training process mode of the search model is the same as the training process mode of the search model from the step 110 to the step 130, and the specific training process mode is the same except for the processed objects, i.e., the objects of the wild animal image and the image, and the training process mode of the search model from the step 110 to the step 130 can be referred to, which is not described herein again.
In the embodiment of the invention, all samples in the sample set of the original ternary image group are used, and the difficulty degree is increased by each group of original ternary image group in the sample set of the original ternary image group; in the first stage of the THSG, the difficulty degree of the positive sample pair is increased, the difficulty positive sample pair is obtained, meanwhile, the consistency of the label of the difficulty positive sample pair and the label of the original positive sample pair is ensured, in the second stage, the difficulty degree of the original negative sample is increased, the difficulty negative sample is obtained, and the effective usability of the sample set is improved. Furthermore, the generated difficulty ternary sample group is used as a sample set, so that effective difficulty samples can be supplemented for fewer training sets, and the model can be trained better. Meanwhile, a more robust and robust feature extraction retrieval model is obtained by using the difficult sample pair training.
It should be noted that, generating the antagonistic neural network includes: the method comprises the steps that firstly, an input sample of a generator is obtained, forward derivation of a neural network of the generator is adopted, a generation sample of the generator is obtained, the input sample of the generator and the generation sample of the generator are used as input data of a discriminator, and the source of the input data of the discriminator is identified to be the input sample of the generator or the generation sample of the generator through classification training of the neural network of the discriminator; if the input data of the discriminator is identified as a generation sample of the generator, returning gradient information generated by the discriminator and related to the generation sample of the generator to the generator; and adjusting the neural network of the generator by using the first gradient information, updating the neural network of the generator by using the adjusted neural network of the generator, returning to the first step for continuous execution until the classification training of the neural network of the discriminator is performed, identifying the source of the input data of the discriminator as the input sample of the generator, and outputting the current generated sample.
Based on the foregoing generation of the antagonistic neural network, in the process of obtaining the search model, there are various ways to obtain the adjusted hard positive sample pair, for example, one possible implementation manner is: based on a generated antagonistic neural network, the labels of the adjusted difficult positive sample pairs are consistent with the labels of the original positive sample pairs, the adjusted difficult positive sample pairs are output, and the original negative samples are output to the second stage of the THSG. In order to be able to make the diversity of the output pairs of the adjusted pairs of the difficult positive samples, based on two generating antagonistic neural networks, the labels of the pairs of the difficult positive samples are consistent with the labels of the pairs of the original positive samples, the pairs of the adjusted difficult positive samples are output, and the second stage from the original negative samples to the THSG, wherein the output of the other neural network is optimized by one of the two generating antagonistic neural networks, so that the diversity of the pairs of the adjusted difficult positive samples is obtained.
In combination with the above-mentioned manner of obtaining the adjusted difficult positive sample pairs, how to obtain the search model is described in detail below.
Referring to fig. 4, step 1, an original triplet set is obtained as a sample set.
The step 1 includes: taking the original ternary image group as an input image in a feature extraction network F, wherein the input image has a label, the input image with the label is called an input sample, and the feature of the input sample is extracted by the following formula 1, so that the feature of the extracted input sample can be used as an input in a first stage of the THSG and used for stretching an original positive sample pair by adopting a Piecewise Linear stretching (PLM) mode. The feature extraction network F may be CNN, and the above formula 1 is as follows:
Figure BDA0002554167010000121
wherein the content of the first and second substances,
Figure BDA0002554167010000122
represents the optimal choice of parameters, m represents the abbreviation of min, i represents the optimal, J represents the global loss function, θ m Is a parameter of the feature extraction network F,/ i Is training input sample x i Corresponding label of, input sample x i Is I = [ I ] 1 ,...,I n ]And, the input image I = [ I = [ ] 1 ,...,I n ]Is L = [ L = 1 ,…,l i ,…,l n ]Wherein l is i ∈[1,…,C],I 1 Representing, the first input image, I n Representing the nth input image, l 1 Class representing the first input image, l n Representing the class of the nth input image, n representing the serial number, i.e. nth, l i The number of categories of the ith input image is represented, i represents the ith input image, and C represents the total number of categories of the input images. Extracting the feature of the input image as F (I) by using the feature extraction network F i )∈R N N represents the feature space dimension, and R represents a real number. Mapping the features F (i) of the input image to X = [ X ] 1 ,...,X n ]Wherein X is 1 Representing the extracted feature, X, of the 1 st input image n Representing the extracted features of the nth input image. The last layer of the feature extraction network F is the fully connected layer H that performs the spatial mapping e . The distance loss function is used in order to learn a distance metric in the feature space so that it can reflect the actual semantic distance. To ensure label consistency between the difficult and original exemplars, a final layer in the feature extraction network F, the fully connected layer H e After which a full-link layer H is added c Classification is performed and the layer is trained by a normalized softmax loss function. Reuse of H is employed in generating samples c Layers, performing distance metric loss to train θ as in equation 1 above m
Based on the foregoing, the following continues to illustrate the enhancement of the training process by the generation of resistant difficult samples. Embodiments of the present invention train the generator and distance metric network simultaneously in a competing manner. To obtain more efficient difficult samples, the process of generating difficult samples is divided into two phases, the primary purpose of the first phase of the THSG is to generate difficult candidate samples-positive samples, referred to as difficult positive sample pairs below. The main purpose of the second stage of the THSG is to further improve difficult sample generation. Thus, in the final training phase, the losses of all phases are combined to ensure the performance of the depth metric learning. As shown in fig. 1.
Step 2, in a first stage 11 of the two-stage difficult sample generation frame THSG, stretching an original positive sample pair in a piecewise linear stretching PLM mode to increase the difficulty degree and obtain a difficult positive sample pair; wherein the difficult positive sample pairs include: difficult candidate samples and difficult positive samples.
To obtain the difficult positive sample pairs, the embedded features of the positive sample pairs are linearly stretched, as shown in FIG. 3, so that they deviate in the direction of their center points, generating a difficult candidate sample and positive sample pair a * ,p * As shown in equation 2. In step 2, in a possible implementation manner, a piecewise linear operation formula in a piecewise linear PLM stretching manner is adopted to stretch the original positive sample pair, so as to increase the difficulty level, and obtain a difficult positive sample pair, where the piecewise linear operation formula includes:
Figure BDA0002554167010000131
during stretching, if the value of λ is too large, the positive sample pairs may be stretched into other categories. Even if the distance of the thunder sword is increased, the generated sample can only play a negative role in the training process. To ensure that the generated samples are consistent with the labels of the original samples, embodiments of the present invention need to limit the range and size of λ.
Figure BDA0002554167010000141
Wherein, a * For difficult candidate samples, a is the original candidate sample, λ is the stretch distance coefficient, p is the original positive sample, p * For difficult positive samples, α is the bias hyperparameter, d 0 For the segmentation coefficients, d (a, p) is the distance between the original candidate sample a and the original positive sample p, and γ is the linear hyperparameter.
When d (a, p) is already sufficiently large, i.e. larger than d 0 In time, the embodiment of the present invention uses exponential function formula 3. In this case, λ has a maximum value of α and a minimum value of 0. When the distance between d (a, p) is less than d 0 In time, embodiments of the present invention use linear function equation 4. At this time, the maximum value of λ is α + d 0 *γ,The minimum value is alpha. However, in the calculation of γ, d 0 Related to the distance distribution of the samples in the different data sets, it is therefore difficult to manually adjust among the various data sets. In order to better mine the optimal hyper-parameter d in each dataset 0 In another possible implementation manner, a preferred piecewise linear operation formula in a piecewise linear PLM manner is adopted to stretch the original positive sample pair, so as to increase the difficulty level, and obtain a difficult positive sample pair, where the preferred piecewise linear operation formula includes:
Figure BDA0002554167010000142
wherein, d epoch-1 The average distance of the positive sample pairs calculated in the last training process, wherein the positive sample pairs comprise: the last time is the original positive sample at the time of the first training process, and the last time is the difficult positive sample pair at the time of the non-first training process. After piecewise linear manipulation of the above pairs, embodiments of the present invention yield difficult positive sample pairs. Next, the stretched positive sample pair will be made more efficient using the generator.
Step 3, based on the trained first generation antagonistic neural network, adjusting the labels of the difficult positive sample pairs to be consistent with the labels of the original positive sample pairs, and outputting the adjusted difficult positive sample pairs and the original negative samples to a second stage 12 of THSG; wherein training the first generated antagonistic neural network comprises: a Hard Positive sample pair Generator (Hard antibody-Positive Generator, abbreviated as HAPG) and a Discriminator corresponding to the HAPG, i.e., a Hard Positive sample pair Discriminator (Hard antibody-Positive Discriminator, abbreviated as HAPD); the method comprises the following steps that A1, a hard positive sample pair is used as an input sample of the HAPG, forward derivation of a neural network of the HAPG is adopted to obtain an HAPG generation sample, the input sample of the HAPG and the HAPG generation sample are used as input data of the HAPD, and through two-classification training of the HAPD neural network, the source of the input data of the HAPD is identified to be the HAPG generation sample or the input sample of the HAPG; and B1, if the input data of the HAPD is identified to be the HAPG generated sample, returning first gradient information which is generated by the HAPD and is related to the HAPG generated sample to the HAPG, adjusting the neural network of the HAPG by using the first gradient information, updating the neural network of the HAPG by using the adjusted neural network of the HAPG, returning to the step A1 to continue executing until the two-class training of the HAPD neural network, identifying the source of the input data of the HAPD as the input sample of the HAPG, and taking the current HAPG generated sample as the adjusted difficult positive sample pair.
After segmenting the linearly stretched positive exemplar pairs, to ensure that the generated pairs remain in the same class domain as the original exemplars, embodiments of the present invention require that the exemplars satisfy tag consistency to avoid generating invalid embedded features. However, simple constraints may lead to pattern collapse problems. The embodiment of the present invention introduces HAPG in step 3 above to further adjust the sample. x is a radical of a fluorine atom * After passing HAPG, it will be mapped to x', where x * May refer to a in fig. 4 * And p * (ii) a x ' may refer to a ' and p ' in fig. 4.
To better train the HAPG, embodiments of the present invention establish a discriminator HAPD corresponding thereto. HAPD is a two-classifier that identifies whether a given embedded feature is a true feature or a generated feature. Then, HAPD is trained with the loss shown in equation 8 below.
A Discriminator corresponding to the Reconstruction Condition Generator (RCG), that is, a Reconstruction Condition Discriminator (RCD) is different from the HAPD input and output, and the RCD and HAPD are determined by the following formulas:
Figure BDA0002554167010000151
wherein the content of the first and second substances,
Figure BDA0002554167010000152
to generate a signature of the sample after passing through the discriminator, R (x' i ) As an output of the input data after passing through the discriminator,
Figure BDA0002554167010000153
is the ith generation sample, x' i For the ith input sample, the sample is,
Figure BDA0002554167010000154
to normalize the softmax class loss,
Figure BDA0002554167010000155
as a function of the loss of the discriminator.
HAPD is obtained by the following function:
Figure BDA0002554167010000156
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002554167010000157
as a function of the loss of the HAPD,
Figure BDA0002554167010000158
determining input data for HAPD as a true sample, HAPD (x' i ) The input data of HAPD is determined to be a false sample for HAPD.
In addition, to ensure the versatility of the classification results, H is mentioned above c Layer is reused as C HAPG To distinguish the generated pairs. Then, an embodiment of the present invention trains the HAPG using the loss function equation 8.
The HAPG is obtained by the following function:
Figure BDA0002554167010000161
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002554167010000162
as a function of the loss of the HAPG,
Figure BDA0002554167010000163
is a loss in the HAPD class,
Figure BDA0002554167010000164
for, HAPG class loss, cls is class,
Figure BDA0002554167010000165
to normalize index function softmax class loss, HAPD (x' i ) Output of the generated difficult positive samples through HAPD, C HAPG (x ') is a classification into which the output of HAPG is classified, x' i A difficult sample is generated for the ith.
However, in generating positive sample pairs, it is not sufficient to have only one label consistency constraint. In order to increase the diversity of the hard positive samples after adjustment, the 3 rd step further comprises: based on the trained first generation pairing anti-neural network and the trained third generation pairing anti-neural network, the labels of the adjustment difficult positive sample pairs are consistent with the labels of the original positive sample pairs, and the adjusted difficult positive sample pairs and the original negative samples are output to a second stage of THSG, wherein the trained third generation pairing anti-neural network comprises: an RCG and a discriminator RCD corresponding to the RCG; step A3, generating a sample by using a Hard triplet sample generator (HTG for short), using an original triplet image group as an input sample of the RCG, performing forward derivation by using a neural network of the RCG to obtain an RCG generated sample, using the input sample of the RCG and the RCG generated sample as input data of the RCD, and identifying whether the source of the input data of the RCD is the RCG generated sample or the input sample of the RCG through the binary training of the RCD neural network; and step B3, if the input data for identifying the RCD is the RCG generation sample, returning third gradient information which is generated by the RCD and related to the RCG generation sample to the RCG, adjusting the neural network of the RCG by using the third gradient information, updating the neural network of the RCG by using the adjusted neural network of the RCG, returning to the step A3, and continuing to execute until the two-class training of the RCD neural network is completed, identifying the source of the input data of the RCD as the input sample of the RCG, and taking the current RCG generation sample as an adjusted difficult positive sample pair.
In the above embodiment, regardless of the input x of the HAPG * Where in its class space, HThe APG can spoof the HAPD by generating random embedding from the class space. In this case, the embedded features generated by the HAPG are independent of their input and are not necessarily difficult samples. Thus, embodiments of the present invention introduce a reconstruction condition generator RCG to lose x by reconstruction r And x maps x' back to x r Where x may refer to a and p, x in FIG. 4 r May refer to a in fig. 4 r And p r . This solves the problem of pattern collapse that can occur during HAPG generation. Also, a reconstruction condition discriminator RCD is provided for the RCG as shown in equation 9, and the loss function of the RCG is shown in equation 10.
Specifically, the RCD is obtained by the following function:
Figure BDA0002554167010000171
wherein the content of the first and second substances,
Figure BDA0002554167010000172
in order to be a function of the RCD loss,
Figure BDA0002554167010000173
determining for the RCD that the RCD input data is a true sample, HAPD (x' i ) The input data to the RCD is determined to be a false sample for the RCD.
Determining the RCG by adopting the following loss formula of the RCG, wherein the loss formula of the RCG is as follows:
Figure BDA0002554167010000174
wherein the content of the first and second substances,
Figure BDA0002554167010000175
for the value of the RCG loss function,
Figure BDA0002554167010000176
is the L2 distance between the pre-reconstruction sample and the post-reconstruction sample, and eta is the normalized exponential function and the reconstruction lossThe balance factor between the losses is a function of,
Figure BDA0002554167010000177
for reconstruction of conditional discriminator losses, cls is class characterization, C RCG In order to reconstruct the loss of the condition generator,
Figure BDA0002554167010000178
reconstruction Condition Generator loss concrete form, x, for RCG r The reconstructed vector of the RCG passed through for the generated samples of HAPG, x being the original vector,
Figure BDA0002554167010000179
to normalize the class loss of the exponential function, sm is short for normalizing the exponential function,
Figure BDA00025541670100001710
for the particular vector after the i-th reconstruction,
Figure BDA00025541670100001711
output of the discriminator for reconstructing the backward quantity,/ i Is the ith category label, i is the serial number.
In order to closely correlate the HAPG generated samples with the original samples, embodiments of the present invention allow point-to-point reconstruction of the generated samples with the original samples as completely as possible. Meanwhile, in order to train the RCG better, the embodiment of the present invention adds the same softmax loss function as the HAPG to ensure the consistency of the tags. The softMax function of the RCG is therefore also composed of two parts.
After the HAPG is adjusted, the difficult positive sample pair of the embodiments of the present invention satisfies the requirement of label consistency. At the same time, the RCG also ensures that the generated pairs do not cause pattern collapse due to random generation. Next, embodiments of the present invention will use the difficult positive sample pairs to generate the difficult negative samples and compose the final difficult sample.
Step 4, in the second stage of the THSG, on the basis of the trained second generation antagonistic neural network, increasing the difficulty degree of the original negative sample to obtain a final difficult negative sample, and outputting a final difficult positive sample pair; wherein training the second generative antagonistic neural network comprises: a difficult ternary sample generator HTG and a Discriminator corresponding to the HTG, i.e., a difficult ternary sample Discriminator (HTD); step A2, using the adjusted pair of the difficult positive samples and the original negative sample as input samples of the HTG, obtaining a first distance between the difficult candidate sample and the original negative sample and a second distance between the difficult candidate sample and the difficult positive sample by adopting a neural network of the HTG, obtaining an HTG generation sample by forward derivation of the neural network of the HTG, using the input sample of the HTG and the HTG generation sample as input data of the HTD, and identifying whether the source of the input data of the HTD is the HTG generation sample or the input sample of the HTG through C +1 classification training of the HTD neural network; and step B2, if the source of the input data of the HTD is identified to be an HTG generation sample, returning second gradient information which is generated by the HTD and related to the HTG generation sample to the HTG, adjusting the neural network of the HTG by using the second gradient information and under the condition that the first distance is smaller than the second distance, updating the neural network of the HTG by using the adjusted neural network of the HTG, returning to the step A2 to continue to execute until the C +1 classification training of the HTD neural network is carried out, identifying the source of the input data of the HTD to be the input sample of the HTG, and taking the current HTG generation sample as a final difficult negative sample.
And for the final difficult positive sample pair, the adjusted difficult positive sample pair is mapped through the second phase of the THSG and is output in the second phase of the THSG.
In order to prevent the difficult positive sample pairs from being affected by the reverse triplet state loss, the embodiment of the invention enables the generated positive sample pairs to be reconstructed through the reconstruction loss. Meanwhile, in order to train the HTG better, the embodiment of the present invention adds the same softmax loss function as the HAPG to ensure the consistency of the tag. In a possible implementation manner, the step 4 is based on a trained second generation confrontation neural network, and a self-Adaptive inversion Triplet Loss formula is adopted to increase the difficulty degree of the original negative sample to obtain a final difficult negative sample, wherein the self-Adaptive inversion Triplet Loss (ART-Loss for short) formula is as follows:
Figure BDA0002554167010000181
wherein the content of the first and second substances,
Figure BDA0002554167010000182
is a loss function of the HTG, eta is a balance factor between the normalized exponential function and the reconstruction loss, mu is a reconstruction loss balance parameter,
Figure BDA0002554167010000183
in order to adapt to the inversion loss in an adaptive manner,
Figure BDA0002554167010000184
in order to reconstruct the losses,
Figure BDA0002554167010000185
in order to classify the loss for the HTD,
Figure BDA0002554167010000186
in order to classify the loss for the HTG,
Figure BDA0002554167010000187
a candidate sample generated for the HTG,
Figure BDA0002554167010000188
a positive sample generated by the HTG is,
Figure BDA0002554167010000189
negative samples generated for HTG, a being the original candidate sample,
Figure BDA00025541670100001810
for reconstruction loss, p is the original positive sample,
Figure BDA00025541670100001811
in order to classify the loss in question,
Figure BDA00025541670100001812
for HTG generationSample general name l i As class label, C HTG In order to be a function of the HTG loss,
Figure BDA00025541670100001813
in order to reverse the triplet loss,
Figure BDA00025541670100001814
in order to reverse the triplet loss,
Figure BDA00025541670100001815
wherein, a' is the input of the positive sample,
Figure BDA00025541670100001816
n is the input of the negative sample,
Figure BDA00025541670100001817
p' is the input of the positive sample,
Figure BDA00025541670100001818
is L2 distance, [.] + To cut off from 0, τ r In order to reverse the triplet state loss hyperparameter,
Figure BDA00025541670100001819
v is a constant and the value range of v is from 0 to plus infinity, β is a constant and the value range of β is from 0 to plus infinity,
Figure BDA00025541670100001820
is a loss function of the HTG; thus, when training the network, the HTG performs better and better, and the embodiments of the present invention impose a tighter limit on the reverse triplet state loss, and thus τ is reduced r Is arranged to follow
Figure BDA0002554167010000191
A parameter that is changed.
As the training of the HTG is better,
Figure BDA0002554167010000192
will become smaller, and tau r The adaptivity is increased and the difficulty of difficult sampling is increased. To ensure tag consistency, a discriminator HTD is provided for the HTG. Unlike HAPD, the input to HTD comes from a different class, and thus, the HTD is a C +1 class discriminator. The HTD is obtained using the following equation:
Figure BDA0002554167010000193
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002554167010000194
the value of the HTD loss function, C the original class number,
Figure BDA0002554167010000195
to be composed of
Figure BDA0002554167010000196
To input the result obtained by said HTD, HTD (x) i ) For the result obtained by the HTD with the original sample x as input,
Figure BDA0002554167010000197
respectively shown in FIG. 4
Figure BDA0002554167010000198
And
Figure BDA0002554167010000199
such adaptive reverse triplet depletion uses input samples to generate difficult negative samples, and during HTG generation of negative samples, does not destroy positive sample pairs and ensures label consistency.
And 5, synthesizing the final difficult positive sample pair and the final difficult negative sample to obtain a final difficult ternary sample group.
And 6, taking the final difficult ternary sample group as a sample set, training a convolutional neural network, and obtaining the retrieval model.
Based on the above, the embodiment of the present invention completes the confrontation depth metric learning: the overall structure of the retrieval method based on the generation of the segmentation difficult samples in the embodiment of the invention mainly comprises three parts: a measurement network for obtaining the embedding; a difficulty perception generator THSG for performing the enhancement of the difficulty level through two stages; and corresponding confrontational metric learning. All generators reuse H e And a layer mapping the generated samples to the same feature space as the original samples. Therefore, after the final difficult ternary sample group is generated, the embodiment of the invention trains the CNN feature extraction network by using the corresponding index. Therefore, compared with the conventional deep metric learning method, the method of the embodiment of the present invention can train the CNN network better through difficult samples, as shown in equation 14.
Figure BDA00025541670100001910
Wherein X is the original sample, and
Figure BDA00025541670100001911
are each difficult sample in the final difficult ternary sample set generated.
The embodiment of the invention applies the framework of the THSG to a depth measurement learning framework to improve the performance. For the feature extraction network, the final objective loss function is shown in equation 15.
Figure BDA0002554167010000201
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002554167010000202
representing the overall loss function, denoted F, final abbreviation, is the same predefined parameter as equation 12,
Figure BDA0002554167010000203
a function representing the loss of the generator is expressed,
Figure BDA0002554167010000204
representing a metric function based on the original samples, phi a parameter balance coefficient,
Figure BDA0002554167010000205
representing a metric function based on difficult samples, ori representing origin abbreviations, i.e. original samples, h representing difficult samples, phi representing
Figure BDA0002554167010000206
Showing the softmax loss function, soft stands for softmax abbreviation,
Figure BDA0002554167010000207
represents the metric loss function, t represents the metric, X represents the original sample, the difficult positive sample X' is the output of the first stage, and
Figure BDA0002554167010000208
a difficult negative example generated for the second stage. Therefore, the temperature of the molten metal is controlled,
Figure BDA0002554167010000209
is based on a metric function of the original sample, and
Figure BDA00025541670100002010
is a metric function based on the generated difficult samples.
Figure BDA00025541670100002011
Is composed of
Figure BDA00025541670100002012
And
Figure BDA00025541670100002013
a balance parameter therebetween.
In addition, the embodiment of the invention can train the generation network and the feature extraction network at the same time, and balance parametersNumber of
Figure BDA00025541670100002014
The problem that the training performance of the generated network in the initial training stage is poor is solved. Moreover, after the network training of the embodiment of the invention is finished, the final feature extraction network does not need any extra calculation work.
Unlike prior art metric learning methods, the prior art only mines difficult samples or generates difficult negative samples. According to the THSG provided by the embodiment of the invention, the THSG can generate a final difficult sample through a two-stage network; the THSG derives the final difficult samples by two valid independent generators for generating adjusted difficult positive sample pairs and a generator for generating final negative samples, and the THSG aims to fully exploit the potential of the positive sample negative samples. Experimental results on CUB-200-2011, cars196, and Stanford datasets indicate that THSG effectively improves the performance of existing difficult sample generation methods.
The following provides a description of a retrieval apparatus based on generation of a segmentation-difficult sample according to an embodiment of the present invention.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a retrieval apparatus based on segmentation difficulty sample generation according to an embodiment of the present invention. The retrieval device based on the generation of the segmentation difficult sample provided by the embodiment of the invention can comprise the following modules:
the extraction module 21 is used for extracting the characteristics of the image to be retrieved;
the processing module 22 is configured to use the features of the image to be retrieved as an input of a retrieval model, and obtain, through the retrieval model, a retrieval result related to the image to be retrieved and a distance score between the features of the image to be retrieved and the features of the images in the retrieval model database; the retrieval model is obtained by training based on an original ternary image group serving as a sample set and a final difficult ternary sample group obtained by a two-stage difficult sample generation frame THSG; in the first stage of the THSG, increasing difficulty degree to original positive sample pairs in an original ternary image group to obtain difficult positive sample pairs; adjusting the label of the difficult positive sample pair to be consistent with the label of the original positive sample pair, and outputting the adjusted difficult positive sample pair and the original negative sample in the original ternary image group to a second stage of the THSG; in the second stage of the THSG, increasing the difficulty degree of the original negative sample to obtain a final difficult negative sample and a final difficult positive sample pair; synthesizing the final difficult positive sample pair and the final difficult negative sample to obtain a final difficult ternary sample group;
and the sorting module 23 is configured to sort the retrieval results related to the image to be retrieved according to the distance scores to obtain the retrieval result most related to the image to be retrieved.
In a possible implementation manner, the extraction module is configured to:
extracting the characteristics of the animal image to be retrieved;
the processing module is configured to:
taking the characteristics of the animal image to be retrieved as the input of a retrieval model, and obtaining a retrieval result related to the animal image to be retrieved and a distance score between the characteristics of the animal image to be retrieved and the characteristics of the images in the retrieval model database through the retrieval model;
the sorting module is configured to:
and sequencing the retrieval results related to the images to be retrieved according to the distance scores to obtain the animal retrieval result most related to the animal images to be retrieved.
In one possible implementation, the apparatus further includes: a generating module, configured to obtain the search model through the following steps:
acquiring an original ternary image group serving as a sample set;
in the first stage of the two-stage difficult sample generation frame THSG, stretching the original positive sample pair in a piecewise linear stretching PLM mode to increase the difficulty degree and obtain a difficult positive sample pair; wherein the pair of difficult positive samples comprises: difficult candidate samples and difficult positive samples;
a second stage of adjusting the labels of the difficult positive sample pairs to be consistent with the labels of the original positive sample pairs based on the trained first generation pairing-resistance neural network, outputting the adjusted difficult positive sample pairs and the original negative samples to the THSG; wherein the trained first generated antagonistic neural network comprises: a difficult positive sample pair generator HAPG and a discriminator HAPD corresponding to the HAPG;
in the second stage of the THSG, based on a trained second generation antagonistic neural network, increasing the difficulty degree of the original negative sample to obtain a final difficult negative sample, and outputting a final difficult positive sample pair; wherein the trained second generative antagonistic neural network comprises: a difficult ternary sample generator HTG and a discriminator HTD corresponding to the HTG;
synthesizing the final difficult positive sample pair and the final difficult negative sample to obtain a final difficult ternary sample group;
and taking the final difficult ternary sample group as a sample set, and training a convolutional neural network to obtain the retrieval model.
In one possible implementation manner, the generating module is configured to:
adjusting labels of the pairs of difficult positive samples to be consistent with labels of the pairs of original positive samples based on a trained first generation paired anti-neural network and a trained third generation paired anti-neural network, and outputting the adjusted pairs of difficult positive samples and the original negative samples to a second stage of the THSG, wherein the trained third generation paired anti-neural network comprises: a reconstruction condition generator RCG and a discriminator RCD corresponding to the RCG.
In one possible implementation manner, the generating module is configured to:
and stretching the original positive sample pair by adopting a piecewise linear operation formula in a piecewise linear PLM stretching mode, increasing the difficulty degree and obtaining a difficult positive sample pair, wherein the piecewise linear operation formula comprises the following steps:
a * =a+λ(a-p)
p * =p+λ(p-a)
Figure BDA0002554167010000231
wherein, a * For difficult candidate samples, a is the original candidate sample, λ is the stretch distance coefficient, p is the original positive sample, p * For difficult positive samples, α is the bias hyperparameter, d 0 Is a segmentation coefficient, d (a, p) is the distance between the original candidate sample a and the original positive sample p, and gamma is a linear hyperparameter;
or, stretching the original positive sample pair by adopting a preferred piecewise linear operation formula in a piecewise linear stretching PLM mode, increasing the difficulty level, and obtaining a difficult positive sample pair, wherein the preferred piecewise linear operation formula comprises:
Figure BDA0002554167010000232
wherein d is epoch-1 The average distance of the positive sample pairs calculated in the last training process, wherein the positive sample pairs comprise: the last time is the original positive sample at the time of the first training process, and the last time is the difficult positive sample pair at the time of the non-first training process.
In one possible implementation, the RCD and the HAPD are determined using the following equations, as opposed to the HAPD input and output:
Figure BDA0002554167010000233
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002554167010000234
generating a signature of the sample after passing through the discriminator, R (x' i ) For the output of the input data after passing through the discriminator,
Figure BDA0002554167010000235
is the ith generation sample, x' i Is the ithThe samples are input to the computer system and are processed,
Figure BDA0002554167010000236
to normalize the softmax class loss,
Figure BDA0002554167010000237
as a function of the loss of the discriminator.
In one possible implementation, the HAPG is obtained by the following function:
Figure BDA0002554167010000238
wherein the content of the first and second substances,
Figure BDA0002554167010000239
as a function of the loss of the HAPG,
Figure BDA00025541670100002310
is a loss in the HAPD class,
Figure BDA00025541670100002311
for, HAPG class loss, cls is class,
Figure BDA0002554167010000241
HAPD (x 'for normalized exponential function class loss' i ) Output of the generated difficult positive samples through HAPD, C HAPG (x′ i ) Is a category, x ', into which the output of the HAPG is classified' i A difficult sample is generated for the ith.
In one possible implementation, the RCG is determined using the following RCG loss equation, where the RCG loss equation is as follows:
Figure BDA0002554167010000242
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002554167010000243
for the value of the RCG loss function,
Figure BDA0002554167010000244
is the L2 distance between the pre-reconstruction sample and the post-reconstruction sample, eta is the balance factor between the normalized exponential function and the reconstruction loss,
Figure BDA0002554167010000245
for reconstruction of conditional discriminator losses, cls is class characterization, C RCG In order to reconstruct the loss of the condition generator,
Figure BDA0002554167010000246
reconstruction Condition Generator loss concrete form, x, for RCG r The reconstructed vector of the RCG passed through for the generated samples of HAPG, x is the original vector,
Figure BDA0002554167010000247
is the class loss of the normalized exponential function, sm is the abbreviation of the normalized exponential function,
Figure BDA0002554167010000248
for the particular vector after the i-th reconstruction,
Figure BDA0002554167010000249
output of the discriminator for reconstructing the backward quantity,/ i Is the ith category label, and i is the serial number.
In one possible implementation manner, the generating module is configured to: :
based on the trained second generation countermeasure neural network, adopting a self-adaptive inversion triplet state loss formula to increase the difficulty degree of the original negative sample to obtain a final difficult negative sample, wherein the self-adaptive inversion triplet state loss formula is as follows:
Figure BDA00025541670100002410
wherein the content of the first and second substances,
Figure BDA00025541670100002411
is a loss function of the HTG, eta is a balance factor between the normalized exponential function and the reconstruction loss, mu is a reconstruction loss balance parameter,
Figure BDA00025541670100002412
in order to adapt to the inversion loss in an adaptive manner,
Figure BDA00025541670100002413
in order to reconstruct the loss of the optical fiber,
Figure BDA00025541670100002414
in order to classify the loss for the HTD,
Figure BDA00025541670100002415
in order to classify the loss for the HTG,
Figure BDA00025541670100002416
a candidate sample generated for the HTG is,
Figure BDA00025541670100002417
a positive sample generated for the HTG,
Figure BDA00025541670100002418
negative examples generated for the HTG, a is the original candidate example,
Figure BDA00025541670100002419
for reconstruction loss, p is the original positive sample,
Figure BDA00025541670100002420
in order to classify the loss in question,
Figure BDA00025541670100002421
generation of a sample for HTG, collectively,/ i As class label, C HTG In order to be a function of the HTG loss,
Figure BDA00025541670100002422
in order to reverse the triplet loss,
Figure BDA00025541670100002423
in order to reverse the triplet loss,
Figure BDA00025541670100002424
wherein, a' is the input of the positive sample,
Figure BDA00025541670100002425
n is the input of the negative sample,
Figure BDA00025541670100002426
p' is the input of the positive sample,
Figure BDA0002554167010000251
is L2 distance, [.] + To cut off from 0,. Tau r In order to reverse the triplet state loss hyperparameter,
Figure BDA0002554167010000252
v is a constant and the value range of v is from 0 to positive infinity, beta is a constant and the value range of beta is from 0 to positive infinity,
Figure BDA0002554167010000253
is a loss function of the HTG;
obtaining the HTD by adopting the following formula:
Figure BDA0002554167010000254
wherein the content of the first and second substances,
Figure BDA0002554167010000255
the value of the HTD loss function, C the original class number,
Figure BDA0002554167010000256
to be composed of
Figure BDA0002554167010000257
To input the result obtained by said HTD, HTD (x) i ) Is the result of the HTD with the original sample x as input.
The following continues to describe the electronic device provided by the embodiment of the present invention.
Referring to fig. 6, fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. The embodiment of the present invention further provides an electronic device, which includes a processor 31, a communication interface 32, a memory 33, and a communication bus 34, wherein the processor 31, the communication interface 32, and the memory 33 complete mutual communication through the communication bus 34,
a memory 33 for storing a computer program;
the processor 31 is configured to implement the steps of the above-mentioned search method based on the difficult-to-segment sample generation when executing the program stored in the memory 33, and in one possible implementation manner of the present invention, the following steps may be implemented:
extracting the characteristics of an image to be retrieved;
taking the characteristics of the image to be retrieved as the input of a retrieval model, and obtaining a retrieval result related to the image to be retrieved and a distance score between the characteristics of the image to be retrieved and the characteristics of the image in a retrieval model database through the retrieval model; the retrieval model is obtained by training based on an original ternary image group serving as a sample set and a final difficult ternary sample group obtained by a two-stage difficult sample generation framework THSG; in the first stage of the THSG, increasing difficulty degree to original positive sample pairs in an original ternary image group to obtain difficult positive sample pairs; adjusting the labels of the difficult positive sample pairs to be consistent with the labels of the original positive sample pairs, and outputting the adjusted difficult positive sample pairs and the original negative samples in the original ternary image group to a second stage of the THSG; in the second stage of the THSG, increasing the difficulty degree of the original negative sample to obtain a final difficult negative sample and a final difficult positive sample pair; synthesizing the final difficult positive sample pair and the final difficult negative sample to obtain a final difficult ternary sample group;
and sequencing the retrieval results related to the images to be retrieved according to the distance scores to obtain the retrieval result most related to the images to be retrieved.
The communication bus mentioned in the electronic device may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this is not intended to represent only one bus or type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a RAM (Random Access Memory) or an NVM (Non-Volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also a DSP (Digital Signal Processing), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
The method provided by the embodiment of the invention can be applied to electronic equipment. Specifically, the electronic device may be: desktop computers, laptop computers, intelligent mobile terminals, servers, and the like. Without limitation, any electronic device that can implement the embodiments of the present invention is within the scope of the present invention.
An embodiment of the present invention provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the above-mentioned retrieval method based on the generation of the segmentation-difficult samples.
Embodiments of the present invention provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the steps of the above-described retrieval method based on segmentation-difficult-sample generation.
Embodiments of the present invention provide a computer program which, when run on a computer, causes the computer to perform the steps of the above-described retrieval method based on segmentation-difficult-sample generation.
It should be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus/electronic device/storage medium/computer program product/computer program embodiment comprising instructions, the description is relatively simple as it is substantially similar to the method embodiment, and reference may be made to some descriptions of the method embodiment for relevant points.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement and the like made within the spirit and original scope of the present invention are included in the protection scope of the present invention.

Claims (9)

1. A retrieval method based on segmentation difficult sample generation is characterized by comprising the following steps:
extracting the characteristics of an image to be retrieved;
taking the characteristics of the image to be retrieved as the input of a retrieval model, and obtaining a retrieval result related to the image to be retrieved and a distance score between the characteristics of the image to be retrieved and the characteristics of the image in a retrieval model database through the retrieval model; the retrieval model is obtained by training based on an original ternary image group serving as a sample set and a final difficult ternary sample group obtained by a two-stage difficult sample generation framework THSG; in the first stage of the THSG, increasing difficulty degree to original positive sample pairs in an original ternary image group to obtain difficult positive sample pairs; adjusting the labels of the difficult positive sample pairs to be consistent with the labels of the original positive sample pairs, and outputting the adjusted difficult positive sample pairs and the original negative samples in the original ternary image group to a second stage of the THSG; in the second stage of the THSG, increasing the difficulty degree of the original negative sample to obtain a final difficult negative sample and a final difficult positive sample pair; synthesizing the final difficult positive sample pair and the final difficult negative sample to obtain a final difficult ternary sample group;
sorting the retrieval results related to the images to be retrieved according to the distance scores to obtain the retrieval result most related to the images to be retrieved;
obtaining the retrieval model through the following steps:
acquiring an original ternary image group serving as a sample set;
in the first stage of the two-stage difficult sample generation frame THSG, stretching the original positive sample pair in a piecewise linear stretching PLM mode to increase the difficulty degree and obtain a difficult positive sample pair; wherein the pair of difficult positive samples comprises: difficult candidate samples and difficult positive samples;
a second stage of adjusting the labels of the difficult positive sample pairs to be consistent with the labels of the original positive sample pairs based on the trained first generation antithetical neural network, and outputting the adjusted difficult positive sample pairs and the original negative samples to the THSG; wherein the trained first generated antagonistic neural network comprises: a difficult positive sample pair generator HAPG and a discriminator HAPD corresponding to the HAPG;
in the second stage of the THSG, based on a trained second generation antagonistic neural network, increasing the difficulty degree of the original negative sample to obtain a final difficult negative sample, and outputting a final difficult positive sample pair; wherein the trained second generative antagonistic neural network comprises: a difficult ternary sample generator HTG and a discriminator HTD corresponding to the HTG;
synthesizing the final difficult positive sample pair and the final difficult negative sample to obtain a final difficult ternary sample group;
and taking the final difficult ternary sample group as a sample set, and training a convolutional neural network to obtain the retrieval model.
2. The method as claimed in claim 1, wherein said extracting features of the image to be retrieved comprises:
extracting the characteristics of the animal image to be retrieved;
the method for obtaining the retrieval result related to the image to be retrieved and the distance score between the feature of the image to be retrieved and the feature of the image in the retrieval model database by taking the feature of the image to be retrieved as the input of the retrieval model comprises the following steps:
taking the characteristics of the animal image to be retrieved as the input of a retrieval model, and obtaining a retrieval result related to the animal image to be retrieved and a distance score between the characteristics of the animal image to be retrieved and the characteristics of the images in the retrieval model database through the retrieval model;
the sorting the retrieval results related to the images to be retrieved according to the distance scores to obtain the most related retrieval result of the images to be retrieved comprises the following steps:
and sequencing the retrieval results related to the images to be retrieved according to the distance scores to obtain the animal retrieval result most related to the animal images to be retrieved.
3. The method of claim 1, wherein the second stage of adjusting the labels of the pairs of difficult positive samples to be consistent with the labels of the pairs of original positive samples, outputting the pairs of adjusted difficult positive samples, and the original negative samples to the THSG based on the trained first generated antagonistic neural network comprises:
adjusting labels of the pairs of difficult positive samples to be consistent with labels of the pairs of original positive samples based on a trained first generation paired anti-neural network and a trained third generation paired anti-neural network, and outputting the adjusted pairs of difficult positive samples and the original negative samples to a second stage of the THSG, wherein the trained third generation paired anti-neural network comprises: a reconstruction condition generator RCG and a discriminator RCD corresponding to the RCG.
4. The method of claim 1, wherein the first stage of the two-stage difficult sample generating framework THSG, stretching the original positive sample pair by piecewise linear stretching PLM to increase the difficulty level, and obtaining a difficult positive sample pair, comprises:
and stretching the original positive sample pair by adopting a piecewise linear operation formula in a piecewise linear PLM stretching mode, increasing the difficulty degree and obtaining a difficult positive sample pair, wherein the piecewise linear operation formula comprises the following steps:
a*=a+λ(a-p)
p*=p+λ(p-a)
Figure FDA0003812859190000031
wherein, a * Is a difficult candidate sample, a is the original candidate sample, and λ isCoefficient of stretch distance, p is the original positive sample, p * For difficult positive samples, α is the bias hyperparameter, d 0 Is a segmentation coefficient, d (a, p) is the distance between the original candidate sample a and the original positive sample p, and gamma is a linear hyperparameter;
or, stretching the original positive sample pair by adopting an optimal piecewise linear operation formula in a piecewise linear stretching PLM mode, and increasing the difficulty degree to obtain a difficult positive sample pair, wherein the optimal piecewise linear operation formula comprises:
Figure FDA0003812859190000032
wherein, d epoch-1 The average distance of the positive sample pairs calculated in the last training process, wherein the positive sample pairs comprise: the last time is the original positive sample during the first training process, and the last time is the difficult positive sample pair during the non-first training process.
5. The method of claim 3 wherein the RCD is different from the HAPD input and output, and the RCD and HAPD are determined using the following equations:
Figure FDA0003812859190000033
wherein the content of the first and second substances,
Figure FDA0003812859190000034
to generate features of the sample after passing through the discriminator, R (x) i ') is the output of the input data after passing through the discriminator,
Figure FDA0003812859190000035
for the ith generated sample, x i ' is the ith input sample,
Figure FDA0003812859190000036
to be normalizedThe softmax class of losses are taken into account,
Figure FDA0003812859190000037
as a function of the loss of the discriminator.
6. A method according to claim 3, wherein the HAPG is obtained by the function:
Figure FDA0003812859190000041
wherein the content of the first and second substances,
Figure FDA0003812859190000042
as a function of the loss of the HAPG,
Figure FDA0003812859190000043
in order to be a loss of the HAPD class,
Figure FDA0003812859190000044
for HAPG class loss, cls is a class,
Figure FDA0003812859190000045
to normalize the exponential function class loss, HAPD (x) i ') output of the generated difficult positive samples through HAPD, C HAPG (x i ') is a classification of the output of HAPG, x i ' A difficult sample is generated for the ith.
7. The method of claim 3, wherein the RCG is determined using the RCG loss equation as follows:
Figure FDA0003812859190000046
wherein the content of the first and second substances,
Figure FDA0003812859190000047
is the value of the RCG loss function,
Figure FDA0003812859190000048
is the L2 distance between the pre-reconstruction sample and the post-reconstruction sample, and eta is a balance factor between the normalized exponential function and the reconstruction loss,
Figure FDA0003812859190000049
for reconstruction of condition discriminator losses, cls is class characterization, C RCG In order to reconstruct the condition generator losses,
Figure FDA00038128591900000410
loss specific form of reconstruction condition generator for RCG, x r The reconstructed vector of the RCG passed through for the generated samples of HAPG, x being the original vector,
Figure FDA00038128591900000411
to normalize the class loss of the exponential function, sm is short for normalizing the exponential function,
Figure FDA00038128591900000412
for the particular vector after the i-th reconstruction,
Figure FDA00038128591900000413
output of the discriminator for reconstructing the backward quantity,/ i Is the ith category label, i is the serial number.
8. The method of claim 1, wherein the adding a difficulty level to the original negative examples based on the trained second generative antagonistic neural network to obtain final difficulty negative examples comprises:
based on a trained second generation antagonistic neural network, adopting a self-adaptive inversion triplet state loss formula to increase the difficulty degree of the original negative sample to obtain a final difficult negative sample, wherein the self-adaptive inversion triplet state loss formula is as follows:
Figure FDA00038128591900000414
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA00038128591900000415
is a loss function of the HTG, eta is a balance factor between the normalized exponential function and the reconstruction loss, mu is a reconstruction loss balance parameter,
Figure FDA00038128591900000416
in order to adapt to the inversion loss in an adaptive manner,
Figure FDA00038128591900000417
in order to reconstruct the losses,
Figure FDA00038128591900000418
in order to classify the loss for the HTD,
Figure FDA00038128591900000419
in order to classify the loss for the HTG,
Figure FDA00038128591900000420
a candidate sample generated for the HTG is,
Figure FDA00038128591900000421
a positive sample generated for the HTG,
Figure FDA00038128591900000422
negative samples generated for HTG, a being the original candidate sample,
Figure FDA00038128591900000423
for reconstruction loss, p is the original positive sample,
Figure FDA0003812859190000051
in order to classify the loss in question,
Figure FDA0003812859190000052
generation of a sample for HTG, collectively,/ i As class label, C HTG In order to be a function of the HTG loss,
Figure FDA0003812859190000053
in order to reverse the triplet loss,
Figure FDA0003812859190000054
in order to reverse the triplet loss,
Figure FDA0003812859190000055
wherein, a' is the input of the positive sample,
Figure FDA0003812859190000056
n is the input of the negative sample,
Figure FDA0003812859190000057
p' is the input of the positive sample,
Figure FDA0003812859190000058
Figure FDA0003812859190000059
is L2 distance, [.] + To cut off from 0,. Tau r In order to reverse the triplet state loss hyperparameter,
Figure FDA00038128591900000510
v is a constant and the value range of v is from 0 to plus infinity, β is a constant and the value range of β is from 0 to plus infinity,
Figure FDA00038128591900000511
is a loss function of the HTG;
obtaining the HTD by adopting the following formula:
Figure FDA00038128591900000512
wherein the content of the first and second substances,
Figure FDA00038128591900000513
the value of the HTD penalty function, C the original class number,
Figure FDA00038128591900000514
to be composed of
Figure FDA00038128591900000515
To input the result obtained by said HTD, HTD (x) i ) Is the result of the HTD with the original sample x as input.
9. A retrieval apparatus based on segmentation-difficult sample generation, comprising:
the extraction module is used for extracting the characteristics of the image to be retrieved;
the processing module is used for taking the characteristics of the image to be retrieved as the input of a retrieval model, and obtaining a retrieval result related to the image to be retrieved and a distance score between the characteristics of the image to be retrieved and the characteristics of the image in the retrieval model database through the retrieval model; the retrieval model is obtained by training based on an original ternary image group serving as a sample set and a final difficult ternary sample group obtained by a two-stage difficult sample generation frame THSG; the final difficult ternary sample group is obtained by increasing the difficulty degree of an original positive sample pair in an original ternary sample group in the first stage of the THSG; adjusting the label of the difficult positive sample pair to be consistent with the label of the original positive sample pair, and outputting the adjusted difficult positive sample pair and the original negative sample in the original ternary image group to a second stage of the THSG; in the second stage of the THSG, increasing the difficulty degree of the original negative sample to obtain a final difficult negative sample and a final difficult positive sample pair; synthesizing the final difficult positive sample pair and the final difficult negative sample to obtain a final difficult ternary sample group;
the sorting module is used for sorting the retrieval results related to the images to be retrieved according to the distance scores to obtain the retrieval result most related to the images to be retrieved;
a generating module, configured to obtain the search model through the following steps:
acquiring an original ternary image group serving as a sample set;
in the first stage of the two-stage difficult sample generation frame THSG, stretching the original positive sample pair in a piecewise linear stretching PLM mode to increase the difficulty degree and obtain a difficult positive sample pair; wherein the pair of difficult positive samples comprises: difficult candidate samples and difficult positive samples;
a second stage of adjusting the labels of the difficult positive sample pairs to be consistent with the labels of the original positive sample pairs based on the trained first generation antithetical neural network, and outputting the adjusted difficult positive sample pairs and the original negative samples to the THSG; wherein the trained first generated antagonistic neural network comprises: a difficult positive sample pair generator HAPG and a discriminator HAPD corresponding to the HAPG;
in the second stage of the THSG, based on a trained second generation antagonistic neural network, increasing the difficulty degree of the original negative sample to obtain a final difficult negative sample, and outputting a final difficult positive sample pair; wherein the trained second generative antagonistic neural network comprises: a difficult ternary sample generator HTG and a discriminator HTD corresponding to the HTG;
synthesizing the final difficult positive sample pair and the final difficult negative sample to obtain a final difficult ternary sample group;
and taking the final difficult ternary sample group as a sample set, and training a convolutional neural network to obtain the retrieval model.
CN202010586972.9A 2020-06-24 2020-06-24 Retrieval method and device based on segmentation difficult sample generation Active CN111858999B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010586972.9A CN111858999B (en) 2020-06-24 2020-06-24 Retrieval method and device based on segmentation difficult sample generation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010586972.9A CN111858999B (en) 2020-06-24 2020-06-24 Retrieval method and device based on segmentation difficult sample generation

Publications (2)

Publication Number Publication Date
CN111858999A CN111858999A (en) 2020-10-30
CN111858999B true CN111858999B (en) 2022-10-25

Family

ID=72988602

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010586972.9A Active CN111858999B (en) 2020-06-24 2020-06-24 Retrieval method and device based on segmentation difficult sample generation

Country Status (1)

Country Link
CN (1) CN111858999B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113780461B (en) * 2021-09-23 2022-08-05 中国人民解放军国防科技大学 Robust neural network training method based on feature matching

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304864A (en) * 2018-01-17 2018-07-20 清华大学 Depth fights metric learning method and device
CN110533106A (en) * 2019-08-30 2019-12-03 腾讯科技(深圳)有限公司 Image classification processing method, device and storage medium
WO2020000961A1 (en) * 2018-06-29 2020-01-02 北京达佳互联信息技术有限公司 Method, device, and server for image tag identification
CN110674692A (en) * 2019-08-23 2020-01-10 北京大学 Target accurate retrieval method and system based on difficult sample generation
CN110929649A (en) * 2019-11-24 2020-03-27 华南理工大学 Network and difficult sample mining method for small target detection

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304864A (en) * 2018-01-17 2018-07-20 清华大学 Depth fights metric learning method and device
WO2020000961A1 (en) * 2018-06-29 2020-01-02 北京达佳互联信息技术有限公司 Method, device, and server for image tag identification
CN110674692A (en) * 2019-08-23 2020-01-10 北京大学 Target accurate retrieval method and system based on difficult sample generation
CN110533106A (en) * 2019-08-30 2019-12-03 腾讯科技(深圳)有限公司 Image classification processing method, device and storage medium
CN110929649A (en) * 2019-11-24 2020-03-27 华南理工大学 Network and difficult sample mining method for small target detection

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
残差网络下基于困难样本挖掘的目标检测;张超等;《激光与光电子学进展》;20180511(第10期);全文 *

Also Published As

Publication number Publication date
CN111858999A (en) 2020-10-30

Similar Documents

Publication Publication Date Title
CN109308318B (en) Training method, device, equipment and medium for cross-domain text emotion classification model
CN110147457B (en) Image-text matching method, device, storage medium and equipment
CN108399158B (en) Attribute emotion classification method based on dependency tree and attention mechanism
US11631248B2 (en) Video watermark identification method and apparatus, device, and storage medium
CN109766557B (en) Emotion analysis method and device, storage medium and terminal equipment
CN112749274B (en) Chinese text classification method based on attention mechanism and interference word deletion
CN107945210B (en) Target tracking method based on deep learning and environment self-adaption
WO2023134082A1 (en) Training method and apparatus for image caption statement generation module, and electronic device
CN111522908A (en) Multi-label text classification method based on BiGRU and attention mechanism
CN113011357A (en) Depth fake face video positioning method based on space-time fusion
CN110968725B (en) Image content description information generation method, electronic device and storage medium
CN112667813B (en) Method for identifying sensitive identity information of referee document
CN113627151B (en) Cross-modal data matching method, device, equipment and medium
CN111898704B (en) Method and device for clustering content samples
CN114925205B (en) GCN-GRU text classification method based on contrast learning
CN112749737A (en) Image classification method and device, electronic equipment and storage medium
Sharma et al. Automatic identification of bird species using audio/video processing
CN113806564B (en) Multi-mode informative text detection method and system
CN111858999B (en) Retrieval method and device based on segmentation difficult sample generation
CN114358249A (en) Target recognition model training method, target recognition method and device
CN113870863A (en) Voiceprint recognition method and device, storage medium and electronic equipment
CN109101984B (en) Image identification method and device based on convolutional neural network
Wang et al. Fine-grained multi-modal self-supervised learning
CN113297387A (en) News detection method for image-text mismatching based on NKD-GNN
CN115374943A (en) Data cognition calculation method and system based on domain confrontation migration network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant