CN113127672A - Generation method, retrieval method, medium and terminal of quantized image retrieval model - Google Patents

Generation method, retrieval method, medium and terminal of quantized image retrieval model Download PDF

Info

Publication number
CN113127672A
CN113127672A CN202110432335.0A CN202110432335A CN113127672A CN 113127672 A CN113127672 A CN 113127672A CN 202110432335 A CN202110432335 A CN 202110432335A CN 113127672 A CN113127672 A CN 113127672A
Authority
CN
China
Prior art keywords
vector
image
preset
text
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110432335.0A
Other languages
Chinese (zh)
Other versions
CN113127672B (en
Inventor
陈斌
王锦鹏
夏树涛
戴涛
李清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen International Graduate School of Tsinghua University
Peng Cheng Laboratory
Original Assignee
Shenzhen International Graduate School of Tsinghua University
Peng Cheng Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen International Graduate School of Tsinghua University, Peng Cheng Laboratory filed Critical Shenzhen International Graduate School of Tsinghua University
Priority to CN202110432335.0A priority Critical patent/CN113127672B/en
Publication of CN113127672A publication Critical patent/CN113127672A/en
Application granted granted Critical
Publication of CN113127672B publication Critical patent/CN113127672B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5846Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a generation method, a retrieval method, a medium and a terminal of a quantized image retrieval model, wherein the generation method comprises the following steps: determining a predictive quantization vector corresponding to a training image in a preset sample set by using a preset network model; determining a text vector corresponding to the training image based on the text label of the training image; and training the preset network model based on the text vector and the prediction quantization vector to obtain a quantization image retrieval model. According to the method and the device, the text labels corresponding to the training images are used as weak supervision labels, the preset network model is trained through the weak supervision labels and the prediction quantization vectors, so that the depth quantization can be learned by using weak label picture data, the problem that the existing depth quantization depends on data with high-quality labels is solved, the labor cost of the quantization image retrieval model can be reduced, and the training cost of the quantization image retrieval model is reduced.

Description

Generation method, retrieval method, medium and terminal of quantized image retrieval model
Technical Field
The present application relates to the field of image retrieval technologies, and in particular, to a method for generating a quantized image retrieval model, a retrieval method, a medium, and a terminal.
Background
Currently, a quantization technique using deep learning (for example, a deep quantization technique using a product neural network (CNN)) is widely applied to large-scale image retrieval, and has a feature of high retrieval accuracy compared to conventional quantization coding based on manual features. However, the existing depth quantization models are generally trained on image datasets with accurate manual labeling (e.g., a CIFAR-10 image dataset and an ImageNet image dataset), which requires a lot of human resources for data labeling before training the models, thereby increasing the training cost of the quantization models.
Thus, the prior art has yet to be improved and enhanced.
Disclosure of Invention
The technical problem to be solved by the present application is to provide a method for generating a quantized image search model, a search method, a medium, and a terminal, which are directed to the deficiencies of the prior art.
In order to solve the above technical problem, a first aspect of the embodiments of the present application provides a method for generating a quantized image retrieval model, where the method includes:
determining a predictive quantization vector corresponding to a training image in a preset sample set by using a preset network model;
determining a text vector corresponding to the training image based on the text label of the training image;
and training the preset network model based on the text vector and the prediction quantization vector to obtain a quantization image retrieval model.
The method for generating the quantitative image retrieval model comprises the steps that the preset sample set comprises a plurality of training image groups, and each training image group in the training image groups comprises a training image and a text label corresponding to the training image.
The method for generating the quantitative image retrieval model comprises the steps that the preset network model comprises a feature extraction module and an attention module; the determining, by using the preset network model, the pre-quantization vector corresponding to the training image in the preset sample set specifically includes:
inputting the training images in the preset sample set into the feature extraction module, and determining feature vectors corresponding to the training images through the feature extraction module;
and inputting the feature vector into the attention module, and determining a predictive quantization vector corresponding to the training image through the attention module.
The method for generating the quantized image retrieval model comprises the steps that the preset network model is provided with a plurality of preset codebooks; inputting the feature vector into the attention module, and determining the pre-quantization vector corresponding to the training image by the attention module specifically includes:
dividing the characteristic vector into a plurality of vector sections, wherein the vector sections correspond to a plurality of preset codebooks one by one;
determining quantization vector sections corresponding to the vector sections based on the preset codebooks corresponding to the vector sections;
and determining the predictive quantization vector corresponding to the training image based on the quantization vector segment corresponding to each vector segment.
The method for generating the quantized image retrieval model, wherein the determining the quantized vector segments corresponding to the vector segments based on the preset codebooks corresponding to the vector segments specifically includes:
for each vector segment in the plurality of vector segments, respectively determining each preset code word in a preset codebook corresponding to the vector segment and the attention weight of the vector segment;
and determining the quantized vector segment corresponding to the vector segment based on each preset code word and the attention weight corresponding to each preset code word so as to obtain the quantized vector segment corresponding to each vector segment.
The method for generating a quantized image retrieval model, wherein the determining, for each of a plurality of vector segments, each preset codeword in a preset codebook corresponding to the vector segment and an attention weight of the vector segment specifically includes:
for each vector segment in the plurality of vector segments, respectively calculating each preset code word in a preset codebook and a first attention weight of the vector segment, and calculating the sum of all the first attention weights;
and for each preset code word in the preset codebook, calculating the ratio of the first attention weight corresponding to the preset code word to the sum value, and taking the ratio as the attention weight corresponding to the preset code word.
The method for generating the quantitative image retrieval model comprises the steps of generating a quantitative image retrieval model, wherein the text labels comprise a plurality of text labels; the determining the text vector corresponding to the text label of the training image specifically includes:
inputting a word embedding model into each text label in the text labels, and determining candidate text vectors corresponding to the text labels through the word embedding model;
and determining a text vector corresponding to the training image based on the candidate text vector corresponding to each text label.
The generation method of the quantized image retrieval model comprises the steps that vector dimensions of candidate text vectors corresponding to the text labels are the same; the determining, based on the candidate text vectors corresponding to the text labels, the text vector corresponding to the training image specifically includes:
and calculating the average text vector of the candidate text vectors corresponding to the text vectors, and taking the average text vector as the text vector corresponding to the training image.
The method for generating the quantized image retrieval model is characterized in that the training of the preset network model based on the text vector and the predicted quantized vector to obtain the quantized image retrieval model specifically includes:
determining a loss function value corresponding to the training image according to the text vector and the prediction quantization vector;
and training the model parameters of the preset network model and a plurality of preset codebooks configured by the model parameters based on the loss function values to obtain a quantized image retrieval model and a plurality of codebooks.
A second aspect of the embodiments of the present application provides an image retrieval method, which applies a quantized image retrieval model determined by the method for generating a quantized image retrieval model as described in any one of the above, where the image retrieval method includes:
inputting a query image into the quantized image retrieval model, and determining a query vector corresponding to the query image through the quantized image retrieval model;
determining similarity between the query vector and each code word in each of a plurality of codebooks;
and retrieving a target image corresponding to the query image in a preset image database based on the determined similarity.
Before the query image is input into the quantized image retrieval model and the query vector corresponding to the query image is determined by the quantized image retrieval model, the image retrieval method further includes:
and respectively inputting each image in a preset image database into the quantized image retrieval model, and determining a quantized vector corresponding to each image through the quantized image retrieval model.
The image retrieval method, wherein retrieving, in a preset image database, a target image corresponding to the query image based on the determined similarity specifically includes:
determining candidate similarity of quantization vectors corresponding to the query image and each image in a preset image database based on the determined similarity;
searching a target image corresponding to the query image in a preset image database based on the determined candidate similarity;
if the target image is found, judging that the preset image database contains the query image;
and if the target image is not found, judging that the preset image database does not contain the query image.
A third aspect of the present embodiments provides a computer readable storage medium storing one or more programs, which are executable by one or more processors to implement the steps in the method for generating a quantized image retrieval model as described in any one of the above, and/or to implement the steps in the method for image retrieval as described in any one of the above.
A fourth aspect of the present embodiment provides a terminal device, which includes: a processor, a memory, and a communication bus; the memory has stored thereon a computer readable program executable by the processor;
the communication bus realizes connection communication between the processor and the memory;
the processor, when executing the computer readable program, implements the steps in the method for generating a quantized image retrieval model as described in any one of the above, and/or implements the steps in the method for image retrieval as described in any one of the above.
Has the advantages that: compared with the prior art, the application provides a generation method, a retrieval method, a medium and a terminal of a quantized image retrieval model, wherein the generation method comprises the following steps: determining a predictive quantization vector corresponding to a training image in a preset sample set by using a preset network model; determining a text vector corresponding to the training image based on the text label of the training image; and training the preset network model based on the text vector and the prediction quantization vector to obtain a quantization image retrieval model. According to the method and the device, the text labels corresponding to the training images are used as weak supervision labels, the preset network model is trained through the weak supervision labels and the prediction quantization vectors, so that the depth quantization can be learned by using weak label picture data, the problem that the existing depth quantization depends on data with high-quality labels is solved, the labor cost of the quantization image retrieval model can be reduced, and the training cost of the quantization image retrieval model is reduced.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without any inventive work.
Fig. 1 is a flowchart of a method for generating a quantized image retrieval model according to the present application.
Fig. 2 is a working schematic diagram of a method for generating a quantized image retrieval model according to the present application.
Fig. 3 is a schematic structural diagram of a terminal device provided in the present application.
Detailed Description
The present application provides a generation method, a retrieval method, a medium, and a terminal of a quantized image retrieval model, and in order to make the purpose, technical solution, and effect of the present application clearer and clearer, the present application will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
In particular implementations, the terminal devices described in embodiments of the present application include, but are not limited to, other portable devices such as mobile phones, laptops, or tablet computers with touch sensitive surfaces (e.g., touch displays and/or touch pads). It should also be understood that in some embodiments, the device is not a portable communication device, but is a desktop computer having a touch-sensitive surface (e.g., a touch-sensitive display screen and/or touchpad).
In the discussion that follows, a terminal device that includes a display and a touch-sensitive surface is described. However, it should be understood that the terminal device may also include one or more other physical user interface devices such as a physical keyboard, mouse, and/or joystick.
The terminal device supports various applications, such as one or more of the following: a drawing application, a presentation application, a word processing application, a video conferencing application, a disc burning application, a spreadsheet application, a gaming application, a telephone application, a video conferencing application, an email application, an instant messaging application, an exercise support application, a photo management application, a data camera application, a digital video camera application, a web browsing application, a digital music player application, and/or a digital video playing application, etc.
Various applications that may be executed on the terminal device may use at least one common physical user interface device, such as a touch-sensitive surface. The first or more functions of the touch-sensitive surface and the corresponding information displayed on the terminal may be adjusted and/or changed between applications and/or within respective applications. In this way, a common physical framework (e.g., touch-sensitive surface) of the terminal can support various applications with user interfaces that are intuitive and transparent to the user.
It should be understood that, the sequence numbers and sizes of the steps in this embodiment do not mean the execution sequence, and the execution sequence of each process is determined by its function and inherent logic, and should not constitute any limitation on the implementation process of this embodiment.
The inventors have studied and found that a quantization technique using deep learning (for example, a deep quantization technique using a product neural network (CNN)) is widely used in large-scale image retrieval, and has a feature of high retrieval accuracy compared to conventional manual feature-based quantization encoding. However, the existing depth quantization models are generally trained on image datasets with accurate manual labeling (e.g., a CIFAR-10 image dataset and an ImageNet image dataset), which requires a lot of human resources for data labeling before training the models, thereby increasing the training cost of the quantization models.
However, in practical applications, image data with weak annotations is ubiquitous, for example, in a social media application, a user may attach a piece of comment text and select a topic tag when uploading an image, so that the image carries two weak surveillance annotations, namely the comment text and the topic tag. Although the text information carried by the picture does not necessarily reflect the content of the picture accurately, the text information can be used as a weak supervision signal containing visual semantic information of the picture.
Based on this, in the embodiment of the application, a predictive quantization vector corresponding to a training image in a preset sample set is determined by using a preset network model; determining a text vector corresponding to the training image based on the text label of the training image; and training the preset network model based on the text vector and the prediction quantization vector to obtain a quantization image retrieval model. According to the method and the device, the text labels corresponding to the training images are used as weak supervision labels, the preset network model is trained through the weak supervision labels and the prediction quantization vectors, so that the depth quantization can be learned by using weak label picture data, the problem that the existing depth quantization depends on data with high-quality labels is solved, the labor cost of the quantization image retrieval model can be reduced, and the training cost of the quantization image retrieval model is reduced.
The following further describes the content of the application by describing the embodiments with reference to the attached drawings.
The present embodiment provides a method for generating a quantized image retrieval model, as shown in fig. 1 and 2, the method including:
and S10, determining the predictive quantization vector corresponding to the training image in the preset sample set by using the preset network model.
Specifically, the preset sample set is preset and used for training a preset network model to obtain a quantitative image retrieval model. The preset sample set comprises a plurality of training image groups, each training image group in the training image groups comprises a training image and a text label, and the text label can be used as a weak supervision label of the training image. The text labels may include one label or a plurality of labels, and it is understood that the training image may correspond to one text label or a plurality of text labels, and when the training image corresponds to the plurality of text labels, the plurality of text labels are all used as the weak supervision labels of the training image. For example, the training image is a landscape photograph of a valley, and the text labels corresponding to the training image are nature, spectacular, and landscape, respectively.
In an implementation manner of this embodiment, since the user of the social media generally carries comments and/or hashtags when uploading the image, the obtaining process of the preset sample set may be: the method comprises the steps of obtaining an image uploaded by a user in a social media, extracting comments and/or topic labels carried by the image uploaded, taking the extracted text information as a text label corresponding to the image, and finally taking the image and the text label corresponding to the image as a group of training image groups to obtain a preset sample set. Of course, in practical applications, the training images in the preset sample set may also be determined in other manners, for example, shooting the training images through an imaging module, and configuring text labels for the shot training images to form the preset sample set. According to the implementation mode, the uploaded image of the user of the social media is used as the training image, and the comments and/or the topic labels carried by the user are used as the text labels, so that the acquisition speed of the preset sample set can be increased, and the training speed of the quantitative image retrieval model can be increased.
As shown in fig. 2, the preset network model includes a feature extraction module and an attention module; the determining, by using the preset network model, the pre-quantization vector corresponding to the training image in the preset sample set specifically includes:
inputting the training images in the preset sample set into the feature extraction module, and determining feature vectors corresponding to the training images through the feature extraction module;
and inputting the feature vector into the attention module, and determining a predictive quantization vector corresponding to the training image through the attention module.
Specifically, the feature extraction module is configured to extract a feature vector corresponding to a training image, where the feature extraction module may include a feature extraction unit and a conversion unit, an input item of the feature extraction unit is the training image, an output item of the feature extraction unit is a feature map corresponding to the training image, an input item of the conversion unit is a feature map, and an output item of the conversion unit is the feature vector corresponding to the training image. As can be understood, the training image is input into the feature extraction unit, and the feature extraction unit outputs the feature map corresponding to the training image; and inputting the feature map into a conversion unit, and outputting the feature vector corresponding to the training image through the conversion unit.
In an implementation manner of this embodiment, the feature extraction unit may employ a convolutional neural network model, where the convolutional neural network model may include an input layer, a plurality of convolutional layers, and a plurality of fully-connected layers, the input layer, the plurality of convolutional layers, and the plurality of fully-connected layers are sequentially cascaded, an input item of the input layer is a training image, and an input item of the last fully-connected layer is a feature map. In one specific implementation, the number of convolutional layers may be 4 convolutional layers, and the number of fully-connected layers may be 2 fully-connected layers. Of course, in practical applications, the number of the plurality of convolutional layers may be determined according to practical requirements, for example, 5 convolutional layers, etc. Furthermore, the conversion unit is configured to expand the feature map into feature vectors, wherein a vector dimension of the feature vectors is determined based on an image scale of the feature map, and the vector dimension of the feature vectors is equal to an image scale product. For example, the image scale of the feature map is 40 × 3, and then the vector dimension of the feature vector is 40 × 3 — 4800.
In an implementation manner of this embodiment, the preset network model is configured with a plurality of preset codebooks; each preset codebook in the plurality of preset codebooks comprises a plurality of code words, each code word in the plurality of code words is different from each other and can be used as a quantization code corresponding to the training image, wherein the number of the plurality of code words in each preset codebook in the plurality of preset codebooks can be the same, different or different, and can be specifically determined based on actual requirements.
Based on this, the inputting the feature vector into the attention module, and the determining, by the attention module, the pre-quantization vector corresponding to the training image specifically includes:
dividing the feature vector into a plurality of vector segments;
determining quantization vector sections corresponding to the vector sections based on the preset codebooks corresponding to the vector sections;
and determining the predictive quantization vector corresponding to the training image based on the quantization vector segment corresponding to each vector segment.
Specifically, each vector section in the plurality of vector sections is not overlapped, and the plurality of vector sections constitute the feature vector, wherein the number of vector sections of the plurality of vector sections is equal to the number of the plurality of preset codebooks, and the plurality of vector sections correspond to the plurality of preset codebooks one to one. It can be understood that, when dividing the eigenvector into a plurality of vector sections, the codebook numbers of a plurality of preset codebooks may be obtained, and the eigenvector may be divided into a plurality of vector sections based on the codebook numbers. In a specific implementation, the vector dimensions of each of the vector segments are the same, for example, the vector dimension of the feature vector is D, then the number of the vector segments is D, and the vector dimension of each vector segment is M, where D is D × M.
In an implementation manner of this embodiment, a plurality of preset codebooks may be configured with a codebook sequence in advance, after a plurality of vector segments divided by a feature vector are sorted according to the sequence of the vector segments in the feature vector, a vector segment sequence formed by the plurality of vector segments corresponds to a codebook sequence formed by the plurality of preset codebooks, where corresponding means that the positions of the vector segments in a vector end sequence are the same as the positions of the preset codebooks corresponding to the vector end sequence in the codebook sequence. For example, the plurality of vector segments include a vector segment a and a vector segment B, the plurality of preset codebooks include a preset codebook a and a preset codebook B, a vector segment sequence formed by the vector segment a and the vector segment B is < vector segment a, vector segment B >, and a codebook sequence formed by the preset codebook a and the preset codebook B is < preset codebook a, preset codebook B >, so that the preset codebook corresponding to the vector segment a is the preset codebook a, and the preset codebook corresponding to the vector segment B is the preset codebook B.
The quantization vector segment is used for representing a preset codebook, and the vector segment is quantized through the quantization vector segment, so that the feature vector can be quantized through the quantization vector segment corresponding to each vector segment, and the training image corresponding to the feature vector is quantized. In a specific implementation manner of this embodiment, the determining, based on the preset codebook corresponding to each vector segment, a quantized vector segment corresponding to each vector segment specifically includes:
for each vector segment in the plurality of vector segments, respectively determining each preset code word in a preset codebook corresponding to the vector segment and the attention weight of the vector segment;
and determining the quantized vector segment corresponding to the vector segment based on each preset code word and the attention weight corresponding to each preset code word so as to obtain the quantized vector segment corresponding to each vector segment.
Specifically, the attention weight is a weight reflecting a corresponding preset code word in the quantized vector segment, and is used for reflecting the importance degree of the preset code word in the quantized vector segment; the greater the attention weight is, the higher the importance degree of the preset code word corresponding to the attention weight is, and conversely, the smaller the attention weight is, the lower the importance degree of the preset code word corresponding to the attention weight is. The attention weight corresponding to each preset codeword may be preset, or may be calculated based on the preset codeword and the vector segment by using an attention mechanism.
In an implementation manner of this embodiment, for each of the plurality of vector segments, respectively determining each preset codeword in the preset codebook corresponding to the vector segment and the attention weight of the vector segment specifically includes:
for each vector segment in the plurality of vector segments, respectively calculating each preset code word in a preset codebook and a first attention weight of the vector segment, and calculating the sum of all the first attention weights;
and for each preset code word in the preset codebook, calculating the ratio of the first attention weight corresponding to the preset code word to the sum value, and taking the ratio as the attention weight corresponding to the preset code word.
Specifically, the preset codebook is a preset codebook corresponding to the vector segment, each preset codeword in the preset codebook corresponds to a first attention weight, and the first attention weight may be determined based on cosine similarity between the vector segment and the preset codeword, or determined based on vector product between the vector segment and the preset codeword, and so on. In a specific implementation manner of this implementation, the calculation formula of the first attention weight may be:
Figure BDA0003031857890000111
wherein v ismFor the m-th vector segment, the vector is,
Figure BDA0003031857890000112
is the transposed vector of the m-th vector segment,
Figure BDA0003031857890000113
is the kth preset code word in the mth preset codebook.
Further, after the first attention weight is obtained, the calculation formula of the attention weight corresponding to the preset code word may be:
Figure BDA0003031857890000114
wherein v ismFor the m-th vector segment, the vector is,
Figure BDA0003031857890000115
is the transposed vector of the m-th vector segment,
Figure BDA0003031857890000116
is the kth preset code word in the mth preset code book, K is the number of the preset code words in the mth preset code book,
Figure BDA0003031857890000117
is the kth preset code word in the mth preset code book.
After the attention weights corresponding to the preset codewords in the preset codebook are obtained, the quantization vector segments corresponding to the vector segments may be determined based on the attention weights corresponding to the preset codewords, where the quantization vector segments may be preset codewords with the largest attention weight in the preset codewords, or obtained based on the preset codewords in the preset codebook and the attention weights corresponding to the preset codewords.
In an implementation manner of this embodiment, the quantized vector segment corresponding to the vector segment is obtained by weighting each preset codeword in a preset codebook corresponding to the vector segment, and a calculation formula of the quantized vector segment may be:
Figure BDA0003031857890000121
wherein K is the number of preset code words in a preset codebook,
Figure BDA0003031857890000122
the quantized vector segment corresponding to the mth vector segment,
Figure BDA0003031857890000123
for the attention weight corresponding to the kth preset code word,
Figure BDA0003031857890000124
is the kth preset code word.
After the quantized vector segments corresponding to the vector segments are obtained, the quantized vector segments corresponding to the vector segments are connected according to the positions of the corresponding vector segments in the feature vector, so that the predicted quantized vector corresponding to the feature vector is obtained. For example, the feature vectors correspond to vector segments of v1,v2,...,vNVector segment v1The corresponding quantized vector segment is
Figure BDA0003031857890000125
Vector segment v2The corresponding quantized vector segment is
Figure BDA0003031857890000126
.., vector segment vNThe corresponding quantized vector segment is
Figure BDA0003031857890000127
Then the feature vector (v)1,v2,...,vN) Corresponding predictive quantization vector
Figure BDA0003031857890000128
And S20, determining a text vector corresponding to the training image based on the text label of the training image.
Specifically, the text vector is a word vector corresponding to the text label, and it can be understood that the text vector corresponding to the text label can be determined by a word vector model, for example, inputting the text label into the word vector model, outputting a word vector corresponding to the text label by the word vector model, and taking the word vector as a text vector corresponding to the training image. In addition, the training image may correspond to one text label or correspond to a plurality of text labels, and when the training image corresponds to one text label, the word vector corresponding to the text label is the text vector corresponding to the training image; when the training image corresponds to a plurality of text labels, the text vector corresponding to the training image may be determined based on the word vector corresponding to each text label in the plurality of text labels.
In one implementation manner of this embodiment, the text labels include a plurality of text labels; the determining the text vector corresponding to the text label of the training image specifically includes:
inputting a word embedding model into each text label in the text labels, and determining candidate text vectors corresponding to the text labels through the word embedding model;
and determining a text vector corresponding to the training image based on the candidate text vector corresponding to each text label.
Specifically, the word embedding model is trained, and when a text label is input into the word embedding model, the word embedding model may output a candidate text vector corresponding to the text label, so that each text label of the text labels is input into the word embedding model, and the candidate text vector corresponding to each text label is determined by the word embedding model. In addition, in the obtained candidate text vectors corresponding to the text labels, an average value of the candidate text vectors may be used as the text vector corresponding to the training image, or the candidate text vectors are weighted to obtain the text vector corresponding to the training image, or one candidate text vector is randomly selected from a plurality of candidate text vectors to be used as the text vector corresponding to the training image, and the like. In an implementation manner of this embodiment, the vector dimensions of the candidate text vectors corresponding to the text labels are the same; the determining, based on the candidate text vectors corresponding to the text labels, the text vector corresponding to the training image specifically includes: and calculating the average text vector of the candidate text vectors corresponding to the text vectors, and taking the average text vector as the text vector corresponding to the training image.
S30, training the preset network model based on the text vector and the prediction quantization vector to obtain a quantization image retrieval model.
Specifically, the quantized image retrieval model is obtained by training the preset network model, the model structure of the quantized image retrieval model is the same as that of the preset network model, and the difference between the model structure of the quantized image retrieval model and that of the preset network model is that the model parameters of the preset network model are initial model parameters, and a plurality of preset codebooks configured by the preset network model are preset; the model parameters of the quantized image detection model are trained model parameters, and the quantized image retrieval model is configured with a plurality of codebooks, wherein the codebooks are determined in the process of training a preset network model based on a preset sample set. It can be understood that, when the preset network model is trained based on the text vector and the predicted quantization vector, the model parameters of the preset network model and the plurality of preset codebooks are trained, and when the quantized image retrieval model is obtained by training, the trained codebooks are obtained.
Based on this, the training the preset network model based on the text vector and the predicted quantization vector to obtain a quantized image retrieval model specifically includes:
determining a loss function value corresponding to the training image according to the text vector and the prediction quantization vector;
and training the model parameters of the preset network model and a plurality of preset codebooks configured by the model parameters based on the loss function values to obtain a quantized image retrieval model and a plurality of codebooks.
Specifically, the loss function value is determined based on a text vector and a prediction quantization vector, and in the training process, a preset training sample can be divided into a plurality of training batches, when one training batch in the plurality of training batches is used for training a preset network model, the loss function value is determined based on training images included in the training batch, and the preset network model is trained. Of course, each training image may also be used as a training batch, and after a preset network model is trained based on the training image, the loss function value corresponding to the training image is determined.
In an implementation manner of this embodiment, the calculation formula of the loss function value may be:
Figure BDA0003031857890000141
wherein L is the loss function value, B is the size of the training batch,
Figure BDA0003031857890000142
for the predictive quantization vector of the kth training image,
Figure BDA0003031857890000143
predictive quantization vector for k-th training image
Figure BDA0003031857890000144
Transposed vector of (d), tkFor the text vector corresponding to the k training image, tjAnd the text vector corresponding to the jth training image.
In summary, the present embodiment provides a method for generating a quantized image retrieval model, where the method includes: determining a predictive quantization vector corresponding to a training image in a preset sample set by using a preset network model; determining a text vector corresponding to the training image based on the text label of the training image; and training the preset network model based on the text vector and the prediction quantization vector to obtain a quantization image retrieval model. According to the method and the device, the text labels corresponding to the training images are used as weak supervision labels, the preset network model is trained through the weak supervision labels and the prediction quantization vectors, so that the depth quantization can be learned by using weak label picture data, the problem that the existing depth quantization depends on data with high-quality labels is solved, the labor cost of the quantization image retrieval model can be reduced, and the training cost of the quantization image retrieval model is reduced. In addition, firstly, in the embodiment, a text vector corresponding to a training image based on word vector averaging is used, so that the interference of a noise label is automatically eliminated, the text semantic information is enhanced, the effect of the weak supervised learning based on the text vector can be effectively improved, and the training effect of a quantized image retrieval model is further improved; secondly, the training process of deep quantization coding can be carried out end to end through a quantization image retrieval model based on end-to-end product quantization of an attention mechanism, and the precision of an image retrieval technology can be improved; finally, the scheme directly matches the quantized picture representation vector with the corresponding text representation vector by comparing the learning loss function, and can obtain the quantized vector with stronger semantic representation capability.
To further illustrate the effect of the quantized image detection model determined by the method for generating the quantized image retrieval model provided by the embodiment, the public tests on MIR-FLICKR25K and NUS-WIDE data sets, and the MAP indicators when the encoding lengths are 8bits, 16bits, 24bits and 32bits respectively, compare the mainstream methods in the industry, and the results are shown in the following table.
Figure BDA0003031857890000151
Based on the above method for generating a quantized image retrieval model, this embodiment further provides an image retrieval method, which applies the quantized image retrieval model determined by the above method, where the image retrieval method includes:
inputting a query image into the quantized image retrieval model, and determining a query vector corresponding to the query image through the quantized image retrieval model;
determining similarity between the query vector and each code word in each of a plurality of codebooks;
and retrieving a target image corresponding to the query image in a preset image database based on the determined similarity.
Specifically, the query vector is determined by a quantized image retrieval model for the query image, and it can be understood that the query vector is the quantized vector of the query image determined based on the quantized image retrieval model, wherein the process of determining the query vector corresponding to the query image by the quantized image retrieval model can be that the query image is input into the quantized image retrieval model, and the feature vector corresponding to the query image is determined by the quantized image retrieval model; dividing the feature vector into a plurality of feature vector sections based on a plurality of codebooks, selecting candidate code words corresponding to the feature vector sections from the codebooks corresponding to the feature vector sections, and finally connecting the candidate code words corresponding to the feature vector sections to obtain the query vector corresponding to the query image.
In an implementation manner of this embodiment, for each eigenvector segment in the plurality of eigenvector segments, cosine similarity between the eigenvector segment and each codeword in the codebook corresponding to the eigenvector segment is determined, then a codeword with the largest cosine similarity is selected from the plurality of codewords, and the selected codeword is used as a candidate codeword corresponding to the eigenvector segment. The calculation formula of the cosine similarity between the feature vector segment and each code word in the codebook corresponding to the feature vector segment may be:
Figure BDA0003031857890000161
wherein v ismIn order to be a segment of the feature vector,
Figure BDA0003031857890000162
is the ith codeword in the mth codebook, CmIs the mth codebook, is
Figure BDA0003031857890000163
Is composed of
Figure BDA0003031857890000164
The transposed vector of (1).
In addition, in practical application, each preset code word in the preset codebook is configured with a codeAnd the word identifiers can be stored after the candidate code words corresponding to the vector segments are obtained, and the feature vectors of the training images are converted into the quantization vectors identified by a plurality of code words. For example, the predetermined codebooks are C1,C2,...,CNWherein N is the number of the predetermined codebooks, and the predetermined codebooks include a plurality of predetermined codewords
Figure BDA0003031857890000165
Wherein m represents the mth preset codebook, K is the number of preset codewords, and the candidate codewords corresponding to the vector segment m are
Figure BDA0003031857890000166
Then the codeword identification for vector segment m may be k.
Based on this, in an implementation manner of this embodiment, before the query image is input to the quantized image retrieval model, and a query vector corresponding to the query image is determined by the quantized image retrieval model, the method further includes:
and respectively inputting each image in a preset image database into the quantized image retrieval model, and determining a quantized vector corresponding to each image through the quantized image retrieval model.
Specifically, the process of determining the quantization vector may be the same as the process of determining the query vector, and is not described herein again. In addition, after the quantization vectors corresponding to the respective images are determined, the codewords corresponding to the respective codewords in the quantization vectors can be used for representing the codewords, so that the quantization vectors of the plurality of codeword identifications are obtained, each image in a preset image database can be converted into the quantization vectors represented by the plurality of codeword identifications, the image database can be converted into the plurality of quantization vectors and the plurality of codebooks, and the storage space required by the image database can be saved.
In one implementation manner of this embodiment, the similarity is a similarity between the query vector and each of the codewords in the codebooks, that is, each of the codewords in the codebooksThe code words all correspond to a similarity to form a similarity list, so that the process of searching the query image in the image database can be converted into the process of summing the query similarity lists, and the image retrieval speed can be improved. Wherein the similarity between the query vector and the codeword may be
Figure BDA0003031857890000171
Wherein S isq,(m,i)Representing the similarity of the query vector to the codeword,
Figure BDA0003031857890000172
for query vector rqThe transposed vector of (a) is,
Figure BDA0003031857890000173
is a codebook CmThe ith codeword in (1) and, correspondingly, the sequence of similarity between the query vector and the mth codebook may be
Figure BDA0003031857890000174
Wherein S isq,mAs a sequence of similarity of the query vector to the mth codebook, CmIs the mth codebook.
In an implementation manner of this embodiment, the retrieving, based on the determined similarity, the target image corresponding to the query image in a preset database specifically includes:
determining candidate similarity of quantization vectors corresponding to the query image and each image in a preset database based on the determined similarity;
searching a target image corresponding to the query image in a preset database based on the determined candidate similarity;
if the target image is found, judging that the preset database contains the query image;
and if the target image is not found, judging that the preset database does not contain the query image.
Specifically, the calculation formula of the candidate similarity may be:
Figure BDA0003031857890000175
wherein, bmFor presetting the quantization vector segment of the image in the image database with respect to the mth codebook, Sq,mAs a sequence of similarity of the query vector to the mth codebook, CmFor the m-th codebook, the codebook is,
Figure BDA0003031857890000176
for query vector rqThe transposed vector of (1).
Further, after the candidate similarity between each query image and each image is obtained, whether the candidate similarity larger than a preset threshold exists or not can be searched in the candidate similarity, if the candidate similarity larger than the preset threshold exists, the image corresponding to the candidate similarity larger than the preset threshold is used as a target image corresponding to the query image, and the fact that the query image is contained in the preset database is judged; if the candidate similarity larger than the preset threshold does not exist, judging that the target image is not found, and correspondingly judging that the preset database does not contain the query image.
Based on the above-described method for generating a quantized image retrieval model, the present embodiment provides a computer-readable storage medium storing one or more programs, which are executable by one or more processors, to implement the steps in the method for generating a quantized image retrieval model according to the above-described embodiment.
Based on the above generation method of the quantized image retrieval model, the present application further provides a terminal device, as shown in fig. 3, which includes at least one processor (processor) 20; a display screen 21; and a memory (memory)22, and may further include a communication Interface (Communications Interface)23 and a bus 24. The processor 20, the display 21, the memory 22 and the communication interface 23 can communicate with each other through the bus 24. The display screen 21 is configured to display a user guidance interface preset in the initial setting mode. The communication interface 23 may transmit information. The processor 20 may call logic instructions in the memory 22 to perform the methods in the embodiments described above.
Furthermore, the logic instructions in the memory 22 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product.
The memory 22, which is a computer-readable storage medium, may be configured to store a software program, a computer-executable program, such as program instructions or modules corresponding to the methods in the embodiments of the present disclosure. The processor 20 executes the functional application and data processing, i.e. implements the method in the above-described embodiments, by executing the software program, instructions or modules stored in the memory 22.
The memory 22 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal device, and the like. Further, the memory 22 may include a high speed random access memory and may also include a non-volatile memory. For example, a variety of media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, may also be transient storage media.
In addition, the specific processes loaded and executed by the storage medium and the instruction processors in the mobile terminal are described in detail in the method, and are not stated herein.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (14)

1. A method for generating a quantized image search model, the method comprising:
determining a predictive quantization vector corresponding to a training image in a preset sample set by using a preset network model;
determining a text vector corresponding to the training image based on the text label of the training image;
and training the preset network model based on the text vector and the prediction quantization vector to obtain a quantization image retrieval model.
2. The method for generating a quantitative image retrieval model according to claim 1, wherein the preset sample set includes a plurality of training image groups, and each training image group in the plurality of training image groups includes a training image and a text label corresponding to the training image.
3. The method for generating a quantitative image retrieval model according to claim 1, wherein the preset network model comprises a feature extraction module and an attention module; the determining, by using the preset network model, the pre-quantization vector corresponding to the training image in the preset sample set specifically includes:
inputting the training images in the preset sample set into the feature extraction module, and determining feature vectors corresponding to the training images through the feature extraction module;
and inputting the feature vector into the attention module, and determining a predictive quantization vector corresponding to the training image through the attention module.
4. The method for generating a quantized image retrieval model according to claim 3, wherein said predetermined network model is configured with a plurality of predetermined codebooks; inputting the feature vector into the attention module, and determining the pre-quantization vector corresponding to the training image by the attention module specifically includes:
dividing the characteristic vector into a plurality of vector sections, wherein the vector sections correspond to a plurality of preset codebooks one by one;
determining quantization vector sections corresponding to the vector sections based on the preset codebooks corresponding to the vector sections;
and determining the predictive quantization vector corresponding to the training image based on the quantization vector segment corresponding to each vector segment.
5. The method for generating a quantized image retrieval model according to claim 4, wherein said determining the quantized vector segments corresponding to the vector segments based on the preset codebooks corresponding to the vector segments specifically comprises:
for each vector segment in the plurality of vector segments, respectively determining each preset code word in a preset codebook corresponding to the vector segment and the attention weight of the vector segment;
and determining the quantized vector segment corresponding to the vector segment based on each preset code word and the attention weight corresponding to each preset code word so as to obtain the quantized vector segment corresponding to each vector segment.
6. The method of claim 5, wherein the determining, for each of the plurality of vector segments, the attention weight of each predetermined codeword in the predetermined codebook and the vector segment corresponding to the vector segment specifically comprises:
for each vector segment in the plurality of vector segments, respectively calculating each preset code word in a preset codebook and a first attention weight of the vector segment, and calculating the sum of all the first attention weights;
and for each preset code word in the preset codebook, calculating the ratio of the first attention weight corresponding to the preset code word to the sum value, and taking the ratio as the attention weight corresponding to the preset code word.
7. The method for generating a quantitative image retrieval model according to claim 1, wherein the text label comprises a plurality of text labels; the determining the text vector corresponding to the text label of the training image specifically includes:
inputting a word embedding model into each text label in the text labels, and determining candidate text vectors corresponding to the text labels through the word embedding model;
and determining a text vector corresponding to the training image based on the candidate text vector corresponding to each text label.
8. The method for generating a quantized image retrieval model according to claim 7, wherein the vector dimensions of the candidate text vectors corresponding to the text labels are the same; the determining, based on the candidate text vectors corresponding to the text labels, the text vector corresponding to the training image specifically includes:
and calculating the average text vector of the candidate text vectors corresponding to the text vectors, and taking the average text vector as the text vector corresponding to the training image.
9. The method for generating a quantized image retrieval model according to any of claims 1 to 8, wherein the training the preset network model based on the text vector and the predicted quantized vector to obtain the quantized image retrieval model specifically comprises:
determining a loss function value corresponding to the training image according to the text vector and the prediction quantization vector;
and training the model parameters of the preset network model and a plurality of preset codebooks configured by the model parameters based on the loss function values to obtain a quantized image retrieval model and a plurality of codebooks.
10. An image retrieval method to which a quantized image retrieval model determined by the method for generating a quantized image retrieval model according to any one of claims 1 to 9 is applied, the image retrieval method comprising:
inputting a query image into the quantized image retrieval model, and determining a query vector corresponding to the query image through the quantized image retrieval model;
determining similarity between the query vector and each code word in each of a plurality of codebooks;
and retrieving a target image corresponding to the query image in a preset image database based on the determined similarity.
11. The image retrieval method of claim 10, wherein before inputting the query image into the quantized image retrieval model, determining a query vector corresponding to the query image by the quantized image retrieval model, the method further comprises:
and respectively inputting each image in a preset image database into the quantized image retrieval model, and determining a quantized vector corresponding to each image through the quantized image retrieval model.
12. The image retrieval method of claim 11, wherein retrieving, based on the determined similarity, the target image corresponding to the query image in a preset image database specifically includes:
determining candidate similarity of quantization vectors corresponding to the query image and each image in a preset image database based on the determined similarity;
searching a target image corresponding to the query image in a preset image database based on the determined candidate similarity;
if the target image is found, judging that the preset image database contains the query image;
and if the target image is not found, judging that the preset image database does not contain the query image.
13. A computer readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the steps in the method for generating a quantitative image retrieval model according to any one of claims 1 to 9, and/or to implement the steps in the method for image retrieval according to any one of claims 10 to 12.
14. A terminal device, comprising: a processor, a memory, and a communication bus; the memory has stored thereon a computer readable program executable by the processor;
the communication bus realizes connection communication between the processor and the memory;
the processor, when executing the computer readable program, implements the steps in the method for generating a quantitative image retrieval model according to any of claims 1 to 9, and/or implements the steps in the image retrieval method according to any of claims 10 to 12.
CN202110432335.0A 2021-04-21 2021-04-21 Quantized image retrieval model generation method, retrieval method, medium and terminal Active CN113127672B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110432335.0A CN113127672B (en) 2021-04-21 2021-04-21 Quantized image retrieval model generation method, retrieval method, medium and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110432335.0A CN113127672B (en) 2021-04-21 2021-04-21 Quantized image retrieval model generation method, retrieval method, medium and terminal

Publications (2)

Publication Number Publication Date
CN113127672A true CN113127672A (en) 2021-07-16
CN113127672B CN113127672B (en) 2024-06-25

Family

ID=76778824

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110432335.0A Active CN113127672B (en) 2021-04-21 2021-04-21 Quantized image retrieval model generation method, retrieval method, medium and terminal

Country Status (1)

Country Link
CN (1) CN113127672B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114329006A (en) * 2021-09-24 2022-04-12 腾讯科技(深圳)有限公司 Image retrieval method, device, equipment and computer readable storage medium
CN115082837A (en) * 2022-07-27 2022-09-20 新沂市新南环保产业技术研究院有限公司 Flow rate control system for filling PET bottle with purified water and control method thereof
WO2023020214A1 (en) * 2021-08-17 2023-02-23 腾讯科技(深圳)有限公司 Retrieval model training method and apparatus, retrieval method and apparatus, device and medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180285386A1 (en) * 2017-03-31 2018-10-04 Alibaba Group Holding Limited Method, apparatus, and electronic devices for searching images
CN108647350A (en) * 2018-05-16 2018-10-12 中国人民解放军陆军工程大学 Image-text associated retrieval method based on two-channel network
US20190108411A1 (en) * 2017-10-11 2019-04-11 Alibaba Group Holding Limited Image processing method and processing device
CN110516677A (en) * 2019-08-23 2019-11-29 上海云绅智能科技有限公司 A kind of neural network recognization model, target identification method and system
CN110674328A (en) * 2019-09-27 2020-01-10 长城计算机软件与***有限公司 Trademark image retrieval method, system, medium and equipment
CN110851641A (en) * 2018-08-01 2020-02-28 杭州海康威视数字技术股份有限公司 Cross-modal retrieval method and device and readable storage medium
CN110866140A (en) * 2019-11-26 2020-03-06 腾讯科技(深圳)有限公司 Image feature extraction model training method, image searching method and computer equipment
CN111930984A (en) * 2019-04-24 2020-11-13 北京京东振世信息技术有限公司 Image retrieval method, device, server, client and medium
CN112232425A (en) * 2020-10-21 2021-01-15 腾讯科技(深圳)有限公司 Image processing method, image processing device, storage medium and electronic equipment
CN112418298A (en) * 2020-11-19 2021-02-26 北京云从科技有限公司 Data retrieval method, device and computer readable storage medium
US20210089571A1 (en) * 2017-04-10 2021-03-25 Hewlett-Packard Development Company, L.P. Machine learning image search

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180285386A1 (en) * 2017-03-31 2018-10-04 Alibaba Group Holding Limited Method, apparatus, and electronic devices for searching images
US20210089571A1 (en) * 2017-04-10 2021-03-25 Hewlett-Packard Development Company, L.P. Machine learning image search
US20190108411A1 (en) * 2017-10-11 2019-04-11 Alibaba Group Holding Limited Image processing method and processing device
CN108647350A (en) * 2018-05-16 2018-10-12 中国人民解放军陆军工程大学 Image-text associated retrieval method based on two-channel network
CN110851641A (en) * 2018-08-01 2020-02-28 杭州海康威视数字技术股份有限公司 Cross-modal retrieval method and device and readable storage medium
CN111930984A (en) * 2019-04-24 2020-11-13 北京京东振世信息技术有限公司 Image retrieval method, device, server, client and medium
CN110516677A (en) * 2019-08-23 2019-11-29 上海云绅智能科技有限公司 A kind of neural network recognization model, target identification method and system
CN110674328A (en) * 2019-09-27 2020-01-10 长城计算机软件与***有限公司 Trademark image retrieval method, system, medium and equipment
CN110866140A (en) * 2019-11-26 2020-03-06 腾讯科技(深圳)有限公司 Image feature extraction model training method, image searching method and computer equipment
CN112232425A (en) * 2020-10-21 2021-01-15 腾讯科技(深圳)有限公司 Image processing method, image processing device, storage medium and electronic equipment
CN112418298A (en) * 2020-11-19 2021-02-26 北京云从科技有限公司 Data retrieval method, device and computer readable storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023020214A1 (en) * 2021-08-17 2023-02-23 腾讯科技(深圳)有限公司 Retrieval model training method and apparatus, retrieval method and apparatus, device and medium
CN114329006A (en) * 2021-09-24 2022-04-12 腾讯科技(深圳)有限公司 Image retrieval method, device, equipment and computer readable storage medium
CN115082837A (en) * 2022-07-27 2022-09-20 新沂市新南环保产业技术研究院有限公司 Flow rate control system for filling PET bottle with purified water and control method thereof
CN115082837B (en) * 2022-07-27 2023-07-04 新沂市新南环保产业技术研究院有限公司 Flow rate control system for filling purified water into PET bottle and control method thereof

Also Published As

Publication number Publication date
CN113127672B (en) 2024-06-25

Similar Documents

Publication Publication Date Title
CN112329465B (en) Named entity recognition method, named entity recognition device and computer readable storage medium
CN113127672B (en) Quantized image retrieval model generation method, retrieval method, medium and terminal
US20190108242A1 (en) Search method and processing device
US11232147B2 (en) Generating contextual tags for digital content
CN112069319B (en) Text extraction method, text extraction device, computer equipment and readable storage medium
CN111159485B (en) Tail entity linking method, device, server and storage medium
CN113434636B (en) Semantic-based approximate text searching method, semantic-based approximate text searching device, computer equipment and medium
CN114861889B (en) Deep learning model training method, target object detection method and device
CN110263218B (en) Video description text generation method, device, equipment and medium
CN107330009B (en) Method and apparatus for creating topic word classification model, and storage medium
CN113657087B (en) Information matching method and device
US12026192B2 (en) Image retrieval method, image retrieval devices, image retrieval system and image display system
CN111950279A (en) Entity relationship processing method, device, equipment and computer readable storage medium
WO2021012691A1 (en) Method and device for image retrieval
CN114492669B (en) Keyword recommendation model training method, recommendation device, equipment and medium
CN115062134A (en) Knowledge question-answering model training and knowledge question-answering method, device and computer equipment
CN110019952B (en) Video description method, system and device
CN114428842A (en) Method and device for expanding question-answer library, electronic equipment and readable storage medium
CN116030375A (en) Video feature extraction and model training method, device, equipment and storage medium
CN113239215B (en) Classification method and device for multimedia resources, electronic equipment and storage medium
US20230306087A1 (en) Method and system of retrieving multimodal assets
CN117009534B (en) Text classification method, apparatus, computer device and storage medium
US20230169110A1 (en) Method and system of content retrieval for visual data
CN111797257B (en) Picture recommendation method and related equipment based on word vector
CN112732913B (en) Method, device, equipment and storage medium for classifying unbalanced samples

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant