CN113886615B - Hand-drawing image real-time retrieval method based on multi-granularity associative learning - Google Patents

Hand-drawing image real-time retrieval method based on multi-granularity associative learning Download PDF

Info

Publication number
CN113886615B
CN113886615B CN202111241283.5A CN202111241283A CN113886615B CN 113886615 B CN113886615 B CN 113886615B CN 202111241283 A CN202111241283 A CN 202111241283A CN 113886615 B CN113886615 B CN 113886615B
Authority
CN
China
Prior art keywords
sketch
picture
image
grade
branch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111241283.5A
Other languages
Chinese (zh)
Other versions
CN113886615A (en
Inventor
戴大伟
刘颖格
唐晓宇
夏书银
王国胤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202111241283.5A priority Critical patent/CN113886615B/en
Publication of CN113886615A publication Critical patent/CN113886615A/en
Application granted granted Critical
Publication of CN113886615B publication Critical patent/CN113886615B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Processing Or Creating Images (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the field of image retrieval, and particularly relates to a hand-drawn image real-time retrieval method for multi-granularity associative learning, which comprises the following steps: training an improved deep neural network model by adopting a triplet state loss function and a multi-granularity associative learning method, extracting an embedded vector of a sketch branch by the trained deep neural network model, sending the embedded vector to a discriminator to judge the grade of the sketch branch, sending the sketch branch to a dimension reduction layer corresponding to the grade, calculating the Euclidean distance between the sketch branch and an image, and returning the retrieved top-k pictures according to the Euclidean distance; the invention designs a multi-stage model, avoids the diversity confusion of incomplete sketches, and provides a multi-granularity association learning method of the progressive incomplete sketches, so that the embedding space of each incomplete sketch approximates to the embedding space of a subsequent sketch and a corresponding target photo, and the target photo is searched out by the minimum sketch strokes.

Description

Hand-drawing image real-time retrieval method based on multi-granularity associative learning
Technical Field
The invention belongs to the field of dynamic sketch retrieval, and particularly relates to a hand-drawn image real-time retrieval method based on multi-granularity associative learning.
Background
Image retrieval is classified into a sample-based image retrieval (EBIR) and a sketch-based image retrieval (SBIR) according to the type of picture retrieved. SBIR is a method of using a hand-drawn sketch lacking color information and texture information as input, and then the retrieval system returns an image library image similar to the hand-drawn sketch. The hand-drawn sketch related in the method is an abstract expression form of human beings on objects to be seen, and unlike texts and labels, the hand-drawn sketch can transmit image information which is difficult to express by characters in a more visual and image mode, so that dissimilarity of the information in the transmission process is effectively prevented. For example, when a user wants to inquire about a certain commodity, and the user lacks knowledge about the commodity and cannot provide picture information or text description, the user can simply draw the shape characteristics of the commodity by virtue of the impression, and search the corresponding commodity through a hand sketch. Nowadays, touch devices are rapidly developed, wherein the popularization of intelligent mobile terminals with touch screen functions such as phones, tablets and the like provides hand-drawing and handwriting input conditions for vast users, so that the frequency of transmitting information by adopting hand-drawing sketches in scenes such as daily life, work, entertainment and the like is continuously increased, and the sketch-based image retrieval is particularly focused due to the potential commercial value of the sketch-based image retrieval.
The main advantage of hand-drawn sketch-based image retrieval compared to text/label-based retrieval is the fine granularity, thus deriving fine-grained sketch retrieval (FG-SBIR), which performs image matching for details of the hand-drawn sketch, aimed at retrieving specific photos in the gallery. Considerable progress has been made in the research of FG-SBIR, but there are two problems in sketching that prevent the wide use of FG-SBIR in practice: (1) insufficient mapping skills of the user; (2) the time required to draw a complete sketch. In the case of reference pictures, sketches drawn by different sketchers for the same object are different in abstract degree, which leads to different sketch forms; without the reference picture, different painters can only complete conception and drawing by means of subjective impressions of themselves, which in turn greatly increases the diversity of sketch forms. Secondly, the drawing level and the drawing style of each person are different, so that the difference of the drawn sketches in style is further increased, the difference of sketch data in semantic association is caused, and the difficulty of sketch semantic understanding is increased. While most advanced vision systems are good at identifying poorly sketched drawings, the time required to draw a complete sketch depends on the drawing capabilities of the plotter, and this waiting time is too long if the result can be retrieved after the complete sketch is drawn. In practical applications, the fastest retrieval of the desired commodity using the least stroke information is a key in real-time retrieval.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a hand drawing image real-time retrieval method based on multi-granularity associative learning, an improved neural network model is provided based on the hand drawing image real-time retrieval method based on multi-granularity associative learning, the improved neural network model comprises three branches f1, f2 and f3, f1 is a pre-training network, f2 is an attention layer, f3 is a dimension reduction layer, a training set of the improved neural network model is an image set formed by a plurality of images and complete sketches corresponding to the images, the complete sketches of each image in the image set are rendered into a plurality of sketches according to the stroke sequence of the drawing, namely a plurality of images, a sketch branch set of the image is constructed through the rendered images, and one image in the image set is selected as a target image in each training;
After training is completed by training the improved neural network model, inputting the hand-drawn image to search the image in real time, wherein the training process of the improved neural network model comprises the following steps:
S0, training three branches f1, f2 and f3 of a neural network model by adopting a triple loss function triple loss according to a hand sketch corresponding to an image in the image set, and fixing parameters after training is finished;
S1, classifying each picture in a sketch branch of a target image according to the number of strokes required for drawing the target image so as to avoid a diversity confusion model of incomplete sketches;
S2, extracting the feature vector of the target image and the feature vector of each picture in the sketch branch through a pre-training network, and obtaining the embedded vector of the target image and the embedded vector of each picture in the sketch branch by adopting an attention mechanism of an attention layer;
S3, sending the embedded vector of the picture into a dimension reduction layer corresponding to the grade to which the picture belongs according to the grade of the picture division;
S4, after the dimension of the embedded vector of the picture is reduced in the dimension reduction layer corresponding to the grade, associating the picture with the picture in the next grade, calculating the mean square loss of the current grade and the picture in the next grade by adopting a mean square loss function MSE loss, and updating the dimension reduction layer by taking the calculated mean square loss as a loss function; this process is repeated until all levels of mean square loss computation are complete.
S5, calculating errors of each picture and the image concentrated image in the sketch branch by adopting a Triplet loss function, adding the errors with errors of all levels, carrying out counter propagation, taking images, except the target image, in the close image and the far image concentrated image as parameters in a target adjustment model, approaching an embedded vector between the picture and the target image, and simultaneously approaching embedded vectors between two adjacent levels;
S6, acquiring a sketch branch of the next target image, and repeating the steps S1-S5 until the model reaches the upper limit of training times.
Further, rendering a complete sketch of an image into N pictures according to the stroke sequence of drawing, wherein the N pictures form a sketch branch, each picture in the sketch branch comprises a first stroke to an nth stroke of the complete sketch, the strokes of each picture are different, N is more than or equal to 1 and less than or equal to N, the pictures are arranged in ascending order according to the stroke number contained in the pictures, and then one sketch branch S= { S 1,s2,...,sn...,sN},sn represents the picture containing the strokes of the first stroke to the nth stroke.
Further, an attention mechanism is adopted to obtain an embedded vector of each picture in the sketch branch, and the expression is as follows:
VH=Global_pooling(B+B.fatt(B))
Wherein B is a feature vector obtained after passing through the pre-training network, f att () represents an attention mechanism, global_ pooling (x) represents Global pooling of an embedded vector obtained through an attention layer, and V H represents an embedded vector further obtained after Global pooling of a sketch branch.
Further, each picture of the sketch branch is classified according to the number of strokes, and each grade is designed with an independent dimension reduction layer, which is also called a linear mapping layer, and the expression of the dimension reduction layer is as follows:
VL=A.VH
Wherein A represents a linear mapping, and V L represents an embedded vector of a sketch branch after dimension reduction.
Furthermore, each level is provided with a corresponding dimension reduction layer, the dimension reduction layer maps 2048-dimension embedded vectors to 64 dimensions, and a multi-granularity associative learning method is adopted to realize the approximation of the feature vector space of the incomplete hand-drawn image to the feature vector space of the relative complete hand-drawn image so as to further optimize the feature vector space of the incomplete hand-drawn image.
Further, the step S1 includes:
if the strokes needed for drawing a complete sketch are N strokes, N pictures are contained in a sketch branch after the complete sketch is rendered;
When the grades are classified, the 1 st to the m-th pictures in the sketch branches are classified into a first grade, namely the first m-th pictures are classified into a first grade, and the m+1st to the 2-th pictures are classified into a second grade, namely the 1 st to the 2-th pictures are classified into a second grade; each level is added with m pictures in turn, namely m strokes;
If P is an integer, p=n/m, the N pictures are divided into P levels altogether, and if P is not an integer, the N pictures are divided into p+1 levels altogether.
Further, the step S1 includes:
If drawing a complete sketch requires that the strokes are N strokes, the sketch branches after the complete sketch is rendered contain N pictures, m k is the number of pictures contained in the kth level, the picture levels are divided by adopting a completeness discriminator according to a formula, the number of pictures contained in each level is sequentially reduced, and the number of pictures contained in the kth level is expressed as follows:
for images with fewer strokes, the number of grades required to be divided is reduced, the calculation pressure of a computer is reduced, and the retrieval efficiency is improved.
Further, the step S4 includes:
calculating the mean square loss of the picture x i in the ith grade and the picture x i+1 randomly selected each time in the ith grade according to the sequence of the strokes from less to more in the process of approaching the ith grade to the (i+1) th grade, sequentially adding the mean square loss of each picture in the ith grade and the mean square loss of the picture in the next grade to obtain the mean square error of the ith grade, approaching the ith grade to the (i+1) th grade, and expressing the mean square loss of the picture x i in the ith grade and the picture x i+1 in the next grade as:
MSE Loss=ω(xi+1-xi)2
Wherein ω >0.
Further, the expression of the triplet loss function is:
Wherein m represents the number of pictures co-rendered by a complete sketch; v [i,j] represents the embedded vector of the ith picture in the sketch branch, and is obtained after passing through the dimension reduction layer; v [i+1,rnd] represents the i+1st picture of the sketch branch; v p denotes the positive sample obtained after passing through the training network and the attention layer, i.e., the embedded vector of the target photograph, v n denotes the negative sample obtained after passing through the pre-training network and the attention layer, i.e., the embedded vector of the image other than the target image in the image set, α is a constant, and d is the euclidean distance.
The invention develops a multi-stage model aiming at sketch branches with different integrality to avoid a diversity confusion model of incomplete sketches, and provides a multi-granularity association learning method of progressive incomplete sketches, so that the embedding space of each incomplete sketch is similar to the embedding space of a subsequent sketch and a corresponding target photo, and the target photo is searched out with the least sketch strokes as much as possible, thereby reducing the searching time of the hand-drawn sketch and improving the searching efficiency.
Drawings
FIG. 1 is a diagram of a deep neural network backbone model of the present invention;
FIG. 2 is a diagram of a deep neural network model of the present invention;
FIG. 3 is a diagram of a multi-granularity joint learning retrieval model of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
A hand-drawn sketch real-time retrieval method based on multi-granularity associative learning is shown in figures 1-3 and comprises the following steps:
obtaining a complete sketch of a target image from a QMUL-shell-V2 dataset and a QMUL-Chair-V2 dataset, rendering the complete sketch into N pictures according to the stroke sequence of drawing, wherein N pictures form a sketch branch, each picture in the sketch branch comprises first to nth strokes of the complete sketch, strokes of each picture are different, N is more than or equal to 1 and less than or equal to N, the pictures are arranged in ascending order according to the stroke number contained in the pictures, and then one sketch branch S= { S 1,s2,...,sn...,sN},sn represents the picture comprising the first to nth strokes;
specifically, the provider of QMUL-Shoe-V2 dataset and QMUL-Chair-V2 dataset found volunteers on different painting bases to have them draw a complete sketch from the target image hand.
Specifically, as shown in fig. 3, for a complete sketch, rendering the complete sketch into N pictures according to the completeness of the sketch, where the N pictures are a sketch branch, and each picture in the sketch branch includes the first pen to the nth pen of the complete sketch. For example: the first picture in the sketch branch only contains the first stroke of the complete sketch, the second picture contains the first stroke and the second stroke of the complete sketch, the third picture contains the first stroke, the second stroke, the third stroke and so on of the complete sketch.
The method comprises the steps of forming an image set by a plurality of images and the complete sketch corresponding to the images, obtaining the complete sketch of each image in the image set, rendering the complete sketch of each image into a plurality of pictures according to the stroke sequence of drawing, forming a sketch branch of one image by the plurality of pictures, finishing the rendering process of the complete sketch of all the images before training, and selecting one image in the image set as a target image in each training.
After training is completed by training the improved neural network model, inputting the hand-drawn image to search the image in real time, wherein the training process of the improved neural network model comprises the following steps:
S0, training three branches f1, f2 and f3 of a neural network model by adopting a triple loss function triple loss according to a hand-drawn sketch corresponding to an image in an image set, fixing parameters after training, as shown in FIG. 1, wherein a positive sample is a target image, the sketch is a hand-drawn complete sketch corresponding to the target image, a negative sample is an image except the target image in the image set, and fixing parameters after training is completed by adopting the three branches of the triple loss training neural network model;
S1, classifying each picture in a sketch branch of a target image according to the number of strokes required for drawing the target image;
S2, extracting the feature vector of the target image and the feature vector of each picture in the sketch branch through a pre-training network, and obtaining the embedded vector of the target image and the embedded vector of each picture in the sketch branch by adopting an attention mechanism of an attention layer;
S3, sending the embedded vector of the picture into a dimension reduction layer corresponding to the grade to which the picture belongs according to the grade of the picture division;
each level is provided with a corresponding dimension reduction layer, the dimension reduction layer maps 2048-dimension embedded vectors to 64 dimensions, and a multi-granularity associative learning method is adopted to realize the approximation of the feature vector space of the incomplete hand-drawn image to the feature vector space of the relative complete hand-drawn image so as to further optimize the feature vector space of the incomplete hand-drawn image.
S4, after the dimension of the embedded vector of the picture is reduced in the dimension reduction layer corresponding to the grade, associating the picture with the picture in the next grade, calculating the mean square loss of the current grade and the picture in the next grade by adopting a mean square loss function MSE loss, and updating the dimension reduction layer by taking the calculated mean square loss as a loss function; repeating the process until the mean square loss calculation of all the levels is completed;
calculating the mean square loss of the picture x i in the ith grade and the picture x i+1 randomly selected each time in the ith grade according to the sequence of the strokes from less to more in the process of approaching the ith grade to the (i+1) th grade, sequentially adding the mean square loss of each picture in the ith grade and the mean square loss of the picture in the next grade to obtain the mean square error of the ith grade, approaching the ith grade to the (i+1) th grade, and expressing the mean square loss of the picture x i in the ith grade and the picture x i+1 in the next grade as:
MSE Loss=ω(xi+1-xi)2
Wherein ω >0.
S5, calculating errors of each picture and the target image in the sketch branch by using a Triplet loss function, adding the errors with errors of all levels, carrying out counter propagation, taking images, except the target image, in a set of close-to-target image and far-away-from-image as parameters in a target adjustment model, approaching an embedded vector between the picture and the target image, and simultaneously approaching embedded vectors between two adjacent levels;
S6, acquiring a sketch branch of the next target image, and repeating the steps S1-S5 until the model reaches the upper limit of training times.
In one embodiment, as shown in fig. 3, step S1 of grading adopts the same stroke number sharing mode, wherein each two pictures contain similar strokes, each two grades also contain similar strokes, and the method comprises the following steps:
if the strokes needed for drawing a complete sketch are N strokes, N pictures are contained in a sketch branch after the complete sketch is rendered;
When the grades are classified, the 1 st to the m-th pictures in the sketch branches are classified into a first grade, namely the first m-th pictures are classified into a first grade, and the m+1st to the 2-th pictures are classified into a second grade, namely the 1 st to the 2-th pictures are classified into a second grade; each level is added with m pictures in turn, namely m strokes;
If P is an integer, p=n/m, the N pictures are divided into P levels altogether, and if P is not an integer, the N pictures are divided into p+1 levels altogether.
Specifically, taking 20 strokes required for drawing a complete sketch as an example, when the 20 pictures are included in a sketch branch after the complete sketch is rendered, dividing the 1 st to 5 th pictures in the sketch branch into a first grade (namely dividing the first five strokes into one grade), dividing the 6 th to 10 th pictures into a second grade (namely dividing the 1 st to 10 th strokes into the second grade), dividing the 11 th to 15 th pictures into a third grade (namely dividing the 1 st to 15 th strokes into the third grade), and dividing the 16 th to 20 th pictures into a fourth grade (namely dividing the 1 st to 20 th strokes into the fourth grade).
In another embodiment, step S1 includes:
If the strokes needed for drawing a complete sketch are N strokes, N pictures are contained in a sketch branch after the complete sketch is rendered, m k is the number of pictures contained in the kth level, the picture levels are divided according to a formula by adopting a completeness discriminator, and the number of pictures contained in each level is sequentially reduced. The method reduces the number of grades to be divided for the images with fewer strokes, reduces the calculation pressure of a computer and improves the retrieval efficiency.
Preferably, an attention mechanism is adopted to obtain an embedded vector of each picture in the sketch branch, and the expression is:
VH=Global_pooling(B+B.fatt(B))
Wherein B is a feature vector obtained after passing through the pre-training network, f att () represents an attention mechanism, global_ pooling (x) represents Global pooling of an embedded vector obtained through an attention layer, and V H represents an embedded vector further obtained after Global pooling of a sketch branch.
Preferably, each picture in the sketch branch is classified according to the stroke number, and each class is designed with a separate dimension reduction layer, which is also called a linear mapping layer, and the expression of the dimension reduction layer is as follows:
VL=A.VH
Wherein A represents a linear mapping, and V L represents an embedded vector of a sketch branch after dimension reduction.
Preferably, the expression of the triplet loss function is:
Wherein m represents the number of pictures co-rendered by a complete sketch; v [i,j] represents the embedded vector of the ith picture in the sketch branch, and is obtained after passing through the dimension reduction layer; v [i+1,rnd] represents the i+1st picture in the sketch branch; v p denotes the positive sample obtained after passing through the pre-training network and the attention layer, i.e., the embedded vector of the target image, v n denotes the negative sample obtained after passing through the pre-training network and the attention layer, i.e., the embedded vector of the image other than the target image in the image set, α is a constant, and d is the euclidean distance.
The process of real-time retrieval of hand-drawn images comprises the following steps:
S21, taking a sketch drawn on a drawing board by a user as an original sketch, and forming a picture every time one pen is added according to the drawing stroke sequence;
s22, sending the picture of the current stroke into a pre-training network and an attention layer to obtain an embedded vector of the picture of the current stroke;
s23, sending the picture of the current stroke to an integrity discriminator to judge the grade of the picture, and sending the grade to a dimension reduction layer corresponding to the grade;
S24, after the dimension of the embedded vector of the picture is reduced by the dimension reduction layer corresponding to the grade, calculating the Euclidean distance between the picture and the image in the database;
S25, returning the retrieved k pictures according to Euclidean distance between the pictures and the images in the database
S26, acquiring the next stroke drawn by the user, and repeating the steps S22-S25 until the target picture is searched or all strokes are searched.
As shown in fig. 2, the network structure of the deep neural network model is divided into two parts, wherein one part is to train an image set containing a target image by adopting a Triplet loss, and the image set obtains a required embedded vector through three branches f 1、f2 and f 3 of the model; and the other part adopts a triple loss and MSE loss to train a sketch branch, the sketch branch needs to judge the picture level after passing through f 1、f2 of the model, and the picture level is sent into a corresponding f 31、f32、……f3k to obtain a required embedded vector.
When no commodity picture exists and the text is difficult to describe the commodity, a user can manually draw a commodity sketch on the touch screen device by means of the image of the commodity, the commodity sketch is input into a trained neural network model after being rendered into a sketch branch, and the model returns k images which are most similar to the commodity sketch through the retrieval of the sketch branch.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (9)

1. A hand-drawn image real-time retrieval method based on multi-granularity associative learning is characterized in that an improved neural network model is provided by the hand-drawn image real-time retrieval method based on multi-granularity associative learning, the improved neural network model comprises three branches f 1、f2 and f 3, f 1 is a pre-training network, f 2 is an attention layer, f 3 is a dimension reduction layer, a training set of the improved neural network model is an image set composed of a plurality of images and complete sketches corresponding to the images, the complete sketches of each image in the image set are rendered into a plurality of sketches according to the stroke sequence of the drawing, the sketch branch set of the image set is constructed after the complete sketches are rendered, and one image in the image set is selected as a target image in each training;
After training is completed by training the improved neural network model, inputting the hand-drawn image to search the image in real time, wherein the training process of the improved neural network model comprises the following steps:
S0, training three branches f1, f2 and f3 of a neural network model by adopting a triple loss function according to a hand sketch corresponding to an image in the image set, and fixing parameters after training is finished;
S1, classifying each picture in a sketch branch of a target image according to the number of strokes required for drawing the target image;
S2, extracting the feature vector of the target image and the feature vector of each picture in the sketch branch through a pre-training network, and obtaining the embedded vector of the target image and the embedded vector of each picture in the sketch branch by adopting an attention mechanism of an attention layer;
S3, sending the embedded vector of the picture into a dimension reduction layer corresponding to the grade to which the picture belongs according to the grade of the picture division;
S4, after the dimension of the embedded vector of the picture is reduced in the dimension reduction layer corresponding to the grade, associating the picture with the picture in the next grade, calculating the mean square loss of the picture in the current grade and the picture in the next grade by adopting a mean square loss function, and updating the dimension reduction layer by taking the calculated mean square loss as a loss function; repeating the process until the mean square loss calculation of all the levels is completed;
S5, calculating errors of each picture and the image in the image set in the sketch branch by adopting a triplet loss function, adding the errors with errors of all levels, carrying out counter propagation, taking images, except the target image, in the image set close to the target image and the image set far away from the target image as parameters in a target adjustment model, approaching an embedded vector between the picture and the target image, and simultaneously approaching embedded vectors between two adjacent levels;
S6, acquiring a sketch branch of the next target image, and repeating the steps S1-S5 until the model reaches the upper limit of training times.
2. The method for searching the hand-drawn image in real time based on multi-granularity associative learning according to claim 1, wherein the complete sketch of one image is rendered into N pictures according to the stroke sequence of drawing, the N pictures form a sketch branch, each picture in the sketch branch comprises first to nth strokes of the complete sketch, each picture has different strokes, N is more than or equal to 1 and less than or equal to N, the pictures are arranged according to the ascending sequence of the strokes contained in the pictures, and one sketch branch S= { S 1,s2,...,sn...,sN},sn represents the picture containing the first to nth strokes.
3. The method for searching the hand-drawn images in real time based on multi-granularity associative learning according to claim 1, wherein an attention mechanism is adopted to obtain an embedded vector of each picture in a sketch branch, and the expression is as follows:
VH=Global_pooling(B+B·fatt(B))
Wherein B is a feature vector obtained after passing through the pre-training network, f att () represents an attention mechanism, global_ pooling (x) represents Global pooling of an embedded vector obtained through an attention layer, and V H represents an embedded vector further obtained after Global pooling of a sketch branch.
4. A method for searching hand-drawn images based on multi-granularity associative learning in real time according to claim 1 or 3, wherein each picture in the sketch branches is classified according to the stroke number, and each class is designed with a separate dimension-reducing layer, which is also called a linear mapping layer, and the expression:
VL=A·VH
Wherein A represents a linear mapping, and V L represents an embedded vector of a sketch branch after dimension reduction.
5. The method for real-time searching of hand-drawn images based on multi-granularity associative learning according to claim 4, wherein each level is provided with a corresponding dimension reduction layer, the dimension reduction layer maps 2048-dimensional embedded vectors to 64-dimensional, and the multi-granularity associative learning method is adopted to achieve the approximation of the feature vector space of the incomplete hand-drawn image to the feature vector space of the relatively complete hand-drawn image so as to further optimize the feature vector space of the incomplete hand-drawn image.
6. The method for real-time searching of hand-drawn images based on multi-granularity associative learning according to claim 1, wherein the step S1 comprises:
if the strokes needed for drawing a complete sketch are N strokes, N pictures are contained in a sketch branch after the complete sketch is rendered;
When the grades are classified, the 1 st to the m-th pictures in the sketch branches are classified into a first grade, namely the first m-th pictures are classified into a first grade, and the m+1st to the 2-th pictures are classified into a second grade, namely the 1 st to the 2-th pictures are classified into a second grade; each level is added with m pictures in turn, namely m strokes;
If P is an integer, p=n/m, the N pictures are divided into P levels altogether, and if P is not an integer, the N pictures are divided into p+1 levels altogether.
7. The method for real-time searching of hand-drawn images based on multi-granularity associative learning according to claim 1, wherein the step S1 comprises:
If drawing a complete sketch requires that the strokes are N strokes, the sketch branches after the complete sketch is rendered contain N pictures, m k is the number of pictures contained in the kth level, the picture levels are divided by adopting a completeness discriminator according to a formula, the number of pictures contained in each level is sequentially reduced, and the number of pictures contained in the kth level is expressed as follows:
8. The method for real-time searching of hand-drawn images based on multi-granularity associative learning according to claim 1, wherein the step S4 comprises:
calculating the mean square loss of the picture x i in the ith grade and the picture x i+1 randomly selected each time in the ith grade according to the sequence of the strokes from less to more in the process of approaching the ith grade to the (i+1) th grade, sequentially adding the mean square loss of each picture in the ith grade and the mean square loss of the picture in the next grade to obtain the mean square error of the ith grade, approaching the ith grade to the (i+1) th grade, and expressing the mean square loss of the picture x i in the ith grade and the picture x i+1 in the next grade as:
MSE Loss=ω(xi+1-xi)2
Wherein ω >0.
9. The method for searching hand-drawn images in real time based on multi-granularity associative learning according to claim 1 or 2, wherein the expression of the triplet loss function is:
wherein m represents the number of pictures co-rendered by a complete sketch; v [i,j] represents the embedded vector of the ith picture in the sketch branch, and is obtained after passing through the dimension reduction layer; v p denotes the positive sample obtained after passing through the pre-training network and the attention layer, i.e., the embedded vector of the target image, v n denotes the negative sample obtained after passing through the pre-training network and the attention layer, i.e., the embedded vector of the image other than the target image in the image set, α is a constant, and d is the euclidean distance.
CN202111241283.5A 2021-10-25 2021-10-25 Hand-drawing image real-time retrieval method based on multi-granularity associative learning Active CN113886615B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111241283.5A CN113886615B (en) 2021-10-25 2021-10-25 Hand-drawing image real-time retrieval method based on multi-granularity associative learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111241283.5A CN113886615B (en) 2021-10-25 2021-10-25 Hand-drawing image real-time retrieval method based on multi-granularity associative learning

Publications (2)

Publication Number Publication Date
CN113886615A CN113886615A (en) 2022-01-04
CN113886615B true CN113886615B (en) 2024-06-04

Family

ID=79013911

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111241283.5A Active CN113886615B (en) 2021-10-25 2021-10-25 Hand-drawing image real-time retrieval method based on multi-granularity associative learning

Country Status (1)

Country Link
CN (1) CN113886615B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114860980B (en) * 2022-05-26 2024-07-19 重庆邮电大学 Image retrieval method based on matching of sketch local features and global features
CN115878833B (en) * 2023-02-20 2023-06-13 中山大学 Appearance patent image retrieval method and system based on hand-drawn sketch semantics

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220277A (en) * 2017-04-14 2017-09-29 西北大学 Image retrieval algorithm based on cartographical sketching
CN110580302A (en) * 2019-08-13 2019-12-17 天津大学 Sketch image retrieval method based on semi-heterogeneous joint embedded network
CN110598018A (en) * 2019-08-13 2019-12-20 天津大学 Sketch image retrieval method based on cooperative attention
CN111488474A (en) * 2020-03-21 2020-08-04 复旦大学 Fine-grained freehand sketch image retrieval method based on attention enhancement
CN111625667A (en) * 2020-05-18 2020-09-04 北京工商大学 Three-dimensional model cross-domain retrieval method and system based on complex background image
CN112085072A (en) * 2020-08-24 2020-12-15 北方民族大学 Cross-modal retrieval method of sketch retrieval three-dimensional model based on space-time characteristic information

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10430455B2 (en) * 2017-06-09 2019-10-01 Adobe Inc. Sketch and style based image retrieval

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220277A (en) * 2017-04-14 2017-09-29 西北大学 Image retrieval algorithm based on cartographical sketching
CN110580302A (en) * 2019-08-13 2019-12-17 天津大学 Sketch image retrieval method based on semi-heterogeneous joint embedded network
CN110598018A (en) * 2019-08-13 2019-12-20 天津大学 Sketch image retrieval method based on cooperative attention
CN111488474A (en) * 2020-03-21 2020-08-04 复旦大学 Fine-grained freehand sketch image retrieval method based on attention enhancement
CN111625667A (en) * 2020-05-18 2020-09-04 北京工商大学 Three-dimensional model cross-domain retrieval method and system based on complex background image
CN112085072A (en) * 2020-08-24 2020-12-15 北方民族大学 Cross-modal retrieval method of sketch retrieval three-dimensional model based on space-time characteristic information

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Multi-granularity Association Learning for On-the-fly Fine-grained Sketch-based Image Retrieval;DaweiDai等;《Knowledge-Based Systems》;20221011;1-17 *
在线手绘草图识别中的用户建模方法;余淼;;电脑知识与技术;20080825(第S1期);64-66+109 *
基于时序特征的草图识别方法;于美玉;吴昊;郭晓燕;贾棋;郭禾;;计算机科学;20181115(第S2期);208-212 *

Also Published As

Publication number Publication date
CN113886615A (en) 2022-01-04

Similar Documents

Publication Publication Date Title
CN111488474B (en) Fine-grained freehand sketch image retrieval method based on attention enhancement
CN108427738B (en) Rapid image retrieval method based on deep learning
CN113886615B (en) Hand-drawing image real-time retrieval method based on multi-granularity associative learning
WO2022257578A1 (en) Method for recognizing text, and apparatus
CN112800292B (en) Cross-modal retrieval method based on modal specific and shared feature learning
CN111046275B (en) User label determining method and device based on artificial intelligence and storage medium
CN111930894B (en) Long text matching method and device, storage medium and electronic equipment
CN112925962B (en) Hash coding-based cross-modal data retrieval method, system, device and medium
CN110929080B (en) Optical remote sensing image retrieval method based on attention and generation countermeasure network
Watanabe et al. A new pattern representation scheme using data compression
WO2021227091A1 (en) Multi-modal classification method based on graph convolutional neural network
CN113297370B (en) End-to-end multi-modal question-answering method and system based on multi-interaction attention
CN113177141B (en) Multi-label video hash retrieval method and device based on semantic embedded soft similarity
CN109829065B (en) Image retrieval method, device, equipment and computer readable storage medium
CN108446404B (en) Search method and system for unconstrained visual question-answer pointing problem
CN114298122B (en) Data classification method, apparatus, device, storage medium and computer program product
CN105493078A (en) Color sketch image searching
CN113435461B (en) Point cloud local feature extraction method, device, equipment and storage medium
CN113761153A (en) Question and answer processing method and device based on picture, readable medium and electronic equipment
CN115131698A (en) Video attribute determination method, device, equipment and storage medium
CN109918162B (en) High-dimensional graph interactive display method for learnable mass information
CN109472282A (en) A kind of depth image hash method based on few training sample
CN109255377A (en) Instrument recognition methods, device, electronic equipment and storage medium
CN116521913A (en) Sketch three-dimensional model retrieval method based on prototype comparison learning
CN113516118B (en) Multi-mode cultural resource processing method for joint embedding of images and texts

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant