CN113886615B - Hand-drawing image real-time retrieval method based on multi-granularity associative learning - Google Patents
Hand-drawing image real-time retrieval method based on multi-granularity associative learning Download PDFInfo
- Publication number
- CN113886615B CN113886615B CN202111241283.5A CN202111241283A CN113886615B CN 113886615 B CN113886615 B CN 113886615B CN 202111241283 A CN202111241283 A CN 202111241283A CN 113886615 B CN113886615 B CN 113886615B
- Authority
- CN
- China
- Prior art keywords
- sketch
- picture
- image
- grade
- branch
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 230000035045 associative learning Effects 0.000 title claims abstract description 21
- 239000013598 vector Substances 0.000 claims abstract description 66
- 238000012549 training Methods 0.000 claims abstract description 42
- 230000009467 reduction Effects 0.000 claims abstract description 30
- 238000003062 neural network model Methods 0.000 claims abstract description 21
- 230000006870 function Effects 0.000 claims abstract description 17
- 238000011176 pooling Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 11
- 230000007246 mechanism Effects 0.000 claims description 9
- 238000013507 mapping Methods 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000001174 ascending effect Effects 0.000 claims description 3
- 230000013016 learning Effects 0.000 abstract description 3
- 230000000750 progressive effect Effects 0.000 abstract description 2
- 238000009877 rendering Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 241000282414 Homo sapiens Species 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010422 painting Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/53—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Processing Or Creating Images (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the field of image retrieval, and particularly relates to a hand-drawn image real-time retrieval method for multi-granularity associative learning, which comprises the following steps: training an improved deep neural network model by adopting a triplet state loss function and a multi-granularity associative learning method, extracting an embedded vector of a sketch branch by the trained deep neural network model, sending the embedded vector to a discriminator to judge the grade of the sketch branch, sending the sketch branch to a dimension reduction layer corresponding to the grade, calculating the Euclidean distance between the sketch branch and an image, and returning the retrieved top-k pictures according to the Euclidean distance; the invention designs a multi-stage model, avoids the diversity confusion of incomplete sketches, and provides a multi-granularity association learning method of the progressive incomplete sketches, so that the embedding space of each incomplete sketch approximates to the embedding space of a subsequent sketch and a corresponding target photo, and the target photo is searched out by the minimum sketch strokes.
Description
Technical Field
The invention belongs to the field of dynamic sketch retrieval, and particularly relates to a hand-drawn image real-time retrieval method based on multi-granularity associative learning.
Background
Image retrieval is classified into a sample-based image retrieval (EBIR) and a sketch-based image retrieval (SBIR) according to the type of picture retrieved. SBIR is a method of using a hand-drawn sketch lacking color information and texture information as input, and then the retrieval system returns an image library image similar to the hand-drawn sketch. The hand-drawn sketch related in the method is an abstract expression form of human beings on objects to be seen, and unlike texts and labels, the hand-drawn sketch can transmit image information which is difficult to express by characters in a more visual and image mode, so that dissimilarity of the information in the transmission process is effectively prevented. For example, when a user wants to inquire about a certain commodity, and the user lacks knowledge about the commodity and cannot provide picture information or text description, the user can simply draw the shape characteristics of the commodity by virtue of the impression, and search the corresponding commodity through a hand sketch. Nowadays, touch devices are rapidly developed, wherein the popularization of intelligent mobile terminals with touch screen functions such as phones, tablets and the like provides hand-drawing and handwriting input conditions for vast users, so that the frequency of transmitting information by adopting hand-drawing sketches in scenes such as daily life, work, entertainment and the like is continuously increased, and the sketch-based image retrieval is particularly focused due to the potential commercial value of the sketch-based image retrieval.
The main advantage of hand-drawn sketch-based image retrieval compared to text/label-based retrieval is the fine granularity, thus deriving fine-grained sketch retrieval (FG-SBIR), which performs image matching for details of the hand-drawn sketch, aimed at retrieving specific photos in the gallery. Considerable progress has been made in the research of FG-SBIR, but there are two problems in sketching that prevent the wide use of FG-SBIR in practice: (1) insufficient mapping skills of the user; (2) the time required to draw a complete sketch. In the case of reference pictures, sketches drawn by different sketchers for the same object are different in abstract degree, which leads to different sketch forms; without the reference picture, different painters can only complete conception and drawing by means of subjective impressions of themselves, which in turn greatly increases the diversity of sketch forms. Secondly, the drawing level and the drawing style of each person are different, so that the difference of the drawn sketches in style is further increased, the difference of sketch data in semantic association is caused, and the difficulty of sketch semantic understanding is increased. While most advanced vision systems are good at identifying poorly sketched drawings, the time required to draw a complete sketch depends on the drawing capabilities of the plotter, and this waiting time is too long if the result can be retrieved after the complete sketch is drawn. In practical applications, the fastest retrieval of the desired commodity using the least stroke information is a key in real-time retrieval.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a hand drawing image real-time retrieval method based on multi-granularity associative learning, an improved neural network model is provided based on the hand drawing image real-time retrieval method based on multi-granularity associative learning, the improved neural network model comprises three branches f1, f2 and f3, f1 is a pre-training network, f2 is an attention layer, f3 is a dimension reduction layer, a training set of the improved neural network model is an image set formed by a plurality of images and complete sketches corresponding to the images, the complete sketches of each image in the image set are rendered into a plurality of sketches according to the stroke sequence of the drawing, namely a plurality of images, a sketch branch set of the image is constructed through the rendered images, and one image in the image set is selected as a target image in each training;
After training is completed by training the improved neural network model, inputting the hand-drawn image to search the image in real time, wherein the training process of the improved neural network model comprises the following steps:
S0, training three branches f1, f2 and f3 of a neural network model by adopting a triple loss function triple loss according to a hand sketch corresponding to an image in the image set, and fixing parameters after training is finished;
S1, classifying each picture in a sketch branch of a target image according to the number of strokes required for drawing the target image so as to avoid a diversity confusion model of incomplete sketches;
S2, extracting the feature vector of the target image and the feature vector of each picture in the sketch branch through a pre-training network, and obtaining the embedded vector of the target image and the embedded vector of each picture in the sketch branch by adopting an attention mechanism of an attention layer;
S3, sending the embedded vector of the picture into a dimension reduction layer corresponding to the grade to which the picture belongs according to the grade of the picture division;
S4, after the dimension of the embedded vector of the picture is reduced in the dimension reduction layer corresponding to the grade, associating the picture with the picture in the next grade, calculating the mean square loss of the current grade and the picture in the next grade by adopting a mean square loss function MSE loss, and updating the dimension reduction layer by taking the calculated mean square loss as a loss function; this process is repeated until all levels of mean square loss computation are complete.
S5, calculating errors of each picture and the image concentrated image in the sketch branch by adopting a Triplet loss function, adding the errors with errors of all levels, carrying out counter propagation, taking images, except the target image, in the close image and the far image concentrated image as parameters in a target adjustment model, approaching an embedded vector between the picture and the target image, and simultaneously approaching embedded vectors between two adjacent levels;
S6, acquiring a sketch branch of the next target image, and repeating the steps S1-S5 until the model reaches the upper limit of training times.
Further, rendering a complete sketch of an image into N pictures according to the stroke sequence of drawing, wherein the N pictures form a sketch branch, each picture in the sketch branch comprises a first stroke to an nth stroke of the complete sketch, the strokes of each picture are different, N is more than or equal to 1 and less than or equal to N, the pictures are arranged in ascending order according to the stroke number contained in the pictures, and then one sketch branch S= { S 1,s2,...,sn...,sN},sn represents the picture containing the strokes of the first stroke to the nth stroke.
Further, an attention mechanism is adopted to obtain an embedded vector of each picture in the sketch branch, and the expression is as follows:
VH=Global_pooling(B+B.fatt(B))
Wherein B is a feature vector obtained after passing through the pre-training network, f att () represents an attention mechanism, global_ pooling (x) represents Global pooling of an embedded vector obtained through an attention layer, and V H represents an embedded vector further obtained after Global pooling of a sketch branch.
Further, each picture of the sketch branch is classified according to the number of strokes, and each grade is designed with an independent dimension reduction layer, which is also called a linear mapping layer, and the expression of the dimension reduction layer is as follows:
VL=A.VH
Wherein A represents a linear mapping, and V L represents an embedded vector of a sketch branch after dimension reduction.
Furthermore, each level is provided with a corresponding dimension reduction layer, the dimension reduction layer maps 2048-dimension embedded vectors to 64 dimensions, and a multi-granularity associative learning method is adopted to realize the approximation of the feature vector space of the incomplete hand-drawn image to the feature vector space of the relative complete hand-drawn image so as to further optimize the feature vector space of the incomplete hand-drawn image.
Further, the step S1 includes:
if the strokes needed for drawing a complete sketch are N strokes, N pictures are contained in a sketch branch after the complete sketch is rendered;
When the grades are classified, the 1 st to the m-th pictures in the sketch branches are classified into a first grade, namely the first m-th pictures are classified into a first grade, and the m+1st to the 2-th pictures are classified into a second grade, namely the 1 st to the 2-th pictures are classified into a second grade; each level is added with m pictures in turn, namely m strokes;
If P is an integer, p=n/m, the N pictures are divided into P levels altogether, and if P is not an integer, the N pictures are divided into p+1 levels altogether.
Further, the step S1 includes:
If drawing a complete sketch requires that the strokes are N strokes, the sketch branches after the complete sketch is rendered contain N pictures, m k is the number of pictures contained in the kth level, the picture levels are divided by adopting a completeness discriminator according to a formula, the number of pictures contained in each level is sequentially reduced, and the number of pictures contained in the kth level is expressed as follows:
for images with fewer strokes, the number of grades required to be divided is reduced, the calculation pressure of a computer is reduced, and the retrieval efficiency is improved.
Further, the step S4 includes:
calculating the mean square loss of the picture x i in the ith grade and the picture x i+1 randomly selected each time in the ith grade according to the sequence of the strokes from less to more in the process of approaching the ith grade to the (i+1) th grade, sequentially adding the mean square loss of each picture in the ith grade and the mean square loss of the picture in the next grade to obtain the mean square error of the ith grade, approaching the ith grade to the (i+1) th grade, and expressing the mean square loss of the picture x i in the ith grade and the picture x i+1 in the next grade as:
MSE Loss=ω(xi+1-xi)2
Wherein ω >0.
Further, the expression of the triplet loss function is:
Wherein m represents the number of pictures co-rendered by a complete sketch; v [i,j] represents the embedded vector of the ith picture in the sketch branch, and is obtained after passing through the dimension reduction layer; v [i+1,rnd] represents the i+1st picture of the sketch branch; v p denotes the positive sample obtained after passing through the training network and the attention layer, i.e., the embedded vector of the target photograph, v n denotes the negative sample obtained after passing through the pre-training network and the attention layer, i.e., the embedded vector of the image other than the target image in the image set, α is a constant, and d is the euclidean distance.
The invention develops a multi-stage model aiming at sketch branches with different integrality to avoid a diversity confusion model of incomplete sketches, and provides a multi-granularity association learning method of progressive incomplete sketches, so that the embedding space of each incomplete sketch is similar to the embedding space of a subsequent sketch and a corresponding target photo, and the target photo is searched out with the least sketch strokes as much as possible, thereby reducing the searching time of the hand-drawn sketch and improving the searching efficiency.
Drawings
FIG. 1 is a diagram of a deep neural network backbone model of the present invention;
FIG. 2 is a diagram of a deep neural network model of the present invention;
FIG. 3 is a diagram of a multi-granularity joint learning retrieval model of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
A hand-drawn sketch real-time retrieval method based on multi-granularity associative learning is shown in figures 1-3 and comprises the following steps:
obtaining a complete sketch of a target image from a QMUL-shell-V2 dataset and a QMUL-Chair-V2 dataset, rendering the complete sketch into N pictures according to the stroke sequence of drawing, wherein N pictures form a sketch branch, each picture in the sketch branch comprises first to nth strokes of the complete sketch, strokes of each picture are different, N is more than or equal to 1 and less than or equal to N, the pictures are arranged in ascending order according to the stroke number contained in the pictures, and then one sketch branch S= { S 1,s2,...,sn...,sN},sn represents the picture comprising the first to nth strokes;
specifically, the provider of QMUL-Shoe-V2 dataset and QMUL-Chair-V2 dataset found volunteers on different painting bases to have them draw a complete sketch from the target image hand.
Specifically, as shown in fig. 3, for a complete sketch, rendering the complete sketch into N pictures according to the completeness of the sketch, where the N pictures are a sketch branch, and each picture in the sketch branch includes the first pen to the nth pen of the complete sketch. For example: the first picture in the sketch branch only contains the first stroke of the complete sketch, the second picture contains the first stroke and the second stroke of the complete sketch, the third picture contains the first stroke, the second stroke, the third stroke and so on of the complete sketch.
The method comprises the steps of forming an image set by a plurality of images and the complete sketch corresponding to the images, obtaining the complete sketch of each image in the image set, rendering the complete sketch of each image into a plurality of pictures according to the stroke sequence of drawing, forming a sketch branch of one image by the plurality of pictures, finishing the rendering process of the complete sketch of all the images before training, and selecting one image in the image set as a target image in each training.
After training is completed by training the improved neural network model, inputting the hand-drawn image to search the image in real time, wherein the training process of the improved neural network model comprises the following steps:
S0, training three branches f1, f2 and f3 of a neural network model by adopting a triple loss function triple loss according to a hand-drawn sketch corresponding to an image in an image set, fixing parameters after training, as shown in FIG. 1, wherein a positive sample is a target image, the sketch is a hand-drawn complete sketch corresponding to the target image, a negative sample is an image except the target image in the image set, and fixing parameters after training is completed by adopting the three branches of the triple loss training neural network model;
S1, classifying each picture in a sketch branch of a target image according to the number of strokes required for drawing the target image;
S2, extracting the feature vector of the target image and the feature vector of each picture in the sketch branch through a pre-training network, and obtaining the embedded vector of the target image and the embedded vector of each picture in the sketch branch by adopting an attention mechanism of an attention layer;
S3, sending the embedded vector of the picture into a dimension reduction layer corresponding to the grade to which the picture belongs according to the grade of the picture division;
each level is provided with a corresponding dimension reduction layer, the dimension reduction layer maps 2048-dimension embedded vectors to 64 dimensions, and a multi-granularity associative learning method is adopted to realize the approximation of the feature vector space of the incomplete hand-drawn image to the feature vector space of the relative complete hand-drawn image so as to further optimize the feature vector space of the incomplete hand-drawn image.
S4, after the dimension of the embedded vector of the picture is reduced in the dimension reduction layer corresponding to the grade, associating the picture with the picture in the next grade, calculating the mean square loss of the current grade and the picture in the next grade by adopting a mean square loss function MSE loss, and updating the dimension reduction layer by taking the calculated mean square loss as a loss function; repeating the process until the mean square loss calculation of all the levels is completed;
calculating the mean square loss of the picture x i in the ith grade and the picture x i+1 randomly selected each time in the ith grade according to the sequence of the strokes from less to more in the process of approaching the ith grade to the (i+1) th grade, sequentially adding the mean square loss of each picture in the ith grade and the mean square loss of the picture in the next grade to obtain the mean square error of the ith grade, approaching the ith grade to the (i+1) th grade, and expressing the mean square loss of the picture x i in the ith grade and the picture x i+1 in the next grade as:
MSE Loss=ω(xi+1-xi)2
Wherein ω >0.
S5, calculating errors of each picture and the target image in the sketch branch by using a Triplet loss function, adding the errors with errors of all levels, carrying out counter propagation, taking images, except the target image, in a set of close-to-target image and far-away-from-image as parameters in a target adjustment model, approaching an embedded vector between the picture and the target image, and simultaneously approaching embedded vectors between two adjacent levels;
S6, acquiring a sketch branch of the next target image, and repeating the steps S1-S5 until the model reaches the upper limit of training times.
In one embodiment, as shown in fig. 3, step S1 of grading adopts the same stroke number sharing mode, wherein each two pictures contain similar strokes, each two grades also contain similar strokes, and the method comprises the following steps:
if the strokes needed for drawing a complete sketch are N strokes, N pictures are contained in a sketch branch after the complete sketch is rendered;
When the grades are classified, the 1 st to the m-th pictures in the sketch branches are classified into a first grade, namely the first m-th pictures are classified into a first grade, and the m+1st to the 2-th pictures are classified into a second grade, namely the 1 st to the 2-th pictures are classified into a second grade; each level is added with m pictures in turn, namely m strokes;
If P is an integer, p=n/m, the N pictures are divided into P levels altogether, and if P is not an integer, the N pictures are divided into p+1 levels altogether.
Specifically, taking 20 strokes required for drawing a complete sketch as an example, when the 20 pictures are included in a sketch branch after the complete sketch is rendered, dividing the 1 st to 5 th pictures in the sketch branch into a first grade (namely dividing the first five strokes into one grade), dividing the 6 th to 10 th pictures into a second grade (namely dividing the 1 st to 10 th strokes into the second grade), dividing the 11 th to 15 th pictures into a third grade (namely dividing the 1 st to 15 th strokes into the third grade), and dividing the 16 th to 20 th pictures into a fourth grade (namely dividing the 1 st to 20 th strokes into the fourth grade).
In another embodiment, step S1 includes:
If the strokes needed for drawing a complete sketch are N strokes, N pictures are contained in a sketch branch after the complete sketch is rendered, m k is the number of pictures contained in the kth level, the picture levels are divided according to a formula by adopting a completeness discriminator, and the number of pictures contained in each level is sequentially reduced. The method reduces the number of grades to be divided for the images with fewer strokes, reduces the calculation pressure of a computer and improves the retrieval efficiency.
Preferably, an attention mechanism is adopted to obtain an embedded vector of each picture in the sketch branch, and the expression is:
VH=Global_pooling(B+B.fatt(B))
Wherein B is a feature vector obtained after passing through the pre-training network, f att () represents an attention mechanism, global_ pooling (x) represents Global pooling of an embedded vector obtained through an attention layer, and V H represents an embedded vector further obtained after Global pooling of a sketch branch.
Preferably, each picture in the sketch branch is classified according to the stroke number, and each class is designed with a separate dimension reduction layer, which is also called a linear mapping layer, and the expression of the dimension reduction layer is as follows:
VL=A.VH
Wherein A represents a linear mapping, and V L represents an embedded vector of a sketch branch after dimension reduction.
Preferably, the expression of the triplet loss function is:
Wherein m represents the number of pictures co-rendered by a complete sketch; v [i,j] represents the embedded vector of the ith picture in the sketch branch, and is obtained after passing through the dimension reduction layer; v [i+1,rnd] represents the i+1st picture in the sketch branch; v p denotes the positive sample obtained after passing through the pre-training network and the attention layer, i.e., the embedded vector of the target image, v n denotes the negative sample obtained after passing through the pre-training network and the attention layer, i.e., the embedded vector of the image other than the target image in the image set, α is a constant, and d is the euclidean distance.
The process of real-time retrieval of hand-drawn images comprises the following steps:
S21, taking a sketch drawn on a drawing board by a user as an original sketch, and forming a picture every time one pen is added according to the drawing stroke sequence;
s22, sending the picture of the current stroke into a pre-training network and an attention layer to obtain an embedded vector of the picture of the current stroke;
s23, sending the picture of the current stroke to an integrity discriminator to judge the grade of the picture, and sending the grade to a dimension reduction layer corresponding to the grade;
S24, after the dimension of the embedded vector of the picture is reduced by the dimension reduction layer corresponding to the grade, calculating the Euclidean distance between the picture and the image in the database;
S25, returning the retrieved k pictures according to Euclidean distance between the pictures and the images in the database
S26, acquiring the next stroke drawn by the user, and repeating the steps S22-S25 until the target picture is searched or all strokes are searched.
As shown in fig. 2, the network structure of the deep neural network model is divided into two parts, wherein one part is to train an image set containing a target image by adopting a Triplet loss, and the image set obtains a required embedded vector through three branches f 1、f2 and f 3 of the model; and the other part adopts a triple loss and MSE loss to train a sketch branch, the sketch branch needs to judge the picture level after passing through f 1、f2 of the model, and the picture level is sent into a corresponding f 31、f32、……f3k to obtain a required embedded vector.
When no commodity picture exists and the text is difficult to describe the commodity, a user can manually draw a commodity sketch on the touch screen device by means of the image of the commodity, the commodity sketch is input into a trained neural network model after being rendered into a sketch branch, and the model returns k images which are most similar to the commodity sketch through the retrieval of the sketch branch.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (9)
1. A hand-drawn image real-time retrieval method based on multi-granularity associative learning is characterized in that an improved neural network model is provided by the hand-drawn image real-time retrieval method based on multi-granularity associative learning, the improved neural network model comprises three branches f 1、f2 and f 3, f 1 is a pre-training network, f 2 is an attention layer, f 3 is a dimension reduction layer, a training set of the improved neural network model is an image set composed of a plurality of images and complete sketches corresponding to the images, the complete sketches of each image in the image set are rendered into a plurality of sketches according to the stroke sequence of the drawing, the sketch branch set of the image set is constructed after the complete sketches are rendered, and one image in the image set is selected as a target image in each training;
After training is completed by training the improved neural network model, inputting the hand-drawn image to search the image in real time, wherein the training process of the improved neural network model comprises the following steps:
S0, training three branches f1, f2 and f3 of a neural network model by adopting a triple loss function according to a hand sketch corresponding to an image in the image set, and fixing parameters after training is finished;
S1, classifying each picture in a sketch branch of a target image according to the number of strokes required for drawing the target image;
S2, extracting the feature vector of the target image and the feature vector of each picture in the sketch branch through a pre-training network, and obtaining the embedded vector of the target image and the embedded vector of each picture in the sketch branch by adopting an attention mechanism of an attention layer;
S3, sending the embedded vector of the picture into a dimension reduction layer corresponding to the grade to which the picture belongs according to the grade of the picture division;
S4, after the dimension of the embedded vector of the picture is reduced in the dimension reduction layer corresponding to the grade, associating the picture with the picture in the next grade, calculating the mean square loss of the picture in the current grade and the picture in the next grade by adopting a mean square loss function, and updating the dimension reduction layer by taking the calculated mean square loss as a loss function; repeating the process until the mean square loss calculation of all the levels is completed;
S5, calculating errors of each picture and the image in the image set in the sketch branch by adopting a triplet loss function, adding the errors with errors of all levels, carrying out counter propagation, taking images, except the target image, in the image set close to the target image and the image set far away from the target image as parameters in a target adjustment model, approaching an embedded vector between the picture and the target image, and simultaneously approaching embedded vectors between two adjacent levels;
S6, acquiring a sketch branch of the next target image, and repeating the steps S1-S5 until the model reaches the upper limit of training times.
2. The method for searching the hand-drawn image in real time based on multi-granularity associative learning according to claim 1, wherein the complete sketch of one image is rendered into N pictures according to the stroke sequence of drawing, the N pictures form a sketch branch, each picture in the sketch branch comprises first to nth strokes of the complete sketch, each picture has different strokes, N is more than or equal to 1 and less than or equal to N, the pictures are arranged according to the ascending sequence of the strokes contained in the pictures, and one sketch branch S= { S 1,s2,...,sn...,sN},sn represents the picture containing the first to nth strokes.
3. The method for searching the hand-drawn images in real time based on multi-granularity associative learning according to claim 1, wherein an attention mechanism is adopted to obtain an embedded vector of each picture in a sketch branch, and the expression is as follows:
VH=Global_pooling(B+B·fatt(B))
Wherein B is a feature vector obtained after passing through the pre-training network, f att () represents an attention mechanism, global_ pooling (x) represents Global pooling of an embedded vector obtained through an attention layer, and V H represents an embedded vector further obtained after Global pooling of a sketch branch.
4. A method for searching hand-drawn images based on multi-granularity associative learning in real time according to claim 1 or 3, wherein each picture in the sketch branches is classified according to the stroke number, and each class is designed with a separate dimension-reducing layer, which is also called a linear mapping layer, and the expression:
VL=A·VH
Wherein A represents a linear mapping, and V L represents an embedded vector of a sketch branch after dimension reduction.
5. The method for real-time searching of hand-drawn images based on multi-granularity associative learning according to claim 4, wherein each level is provided with a corresponding dimension reduction layer, the dimension reduction layer maps 2048-dimensional embedded vectors to 64-dimensional, and the multi-granularity associative learning method is adopted to achieve the approximation of the feature vector space of the incomplete hand-drawn image to the feature vector space of the relatively complete hand-drawn image so as to further optimize the feature vector space of the incomplete hand-drawn image.
6. The method for real-time searching of hand-drawn images based on multi-granularity associative learning according to claim 1, wherein the step S1 comprises:
if the strokes needed for drawing a complete sketch are N strokes, N pictures are contained in a sketch branch after the complete sketch is rendered;
When the grades are classified, the 1 st to the m-th pictures in the sketch branches are classified into a first grade, namely the first m-th pictures are classified into a first grade, and the m+1st to the 2-th pictures are classified into a second grade, namely the 1 st to the 2-th pictures are classified into a second grade; each level is added with m pictures in turn, namely m strokes;
If P is an integer, p=n/m, the N pictures are divided into P levels altogether, and if P is not an integer, the N pictures are divided into p+1 levels altogether.
7. The method for real-time searching of hand-drawn images based on multi-granularity associative learning according to claim 1, wherein the step S1 comprises:
If drawing a complete sketch requires that the strokes are N strokes, the sketch branches after the complete sketch is rendered contain N pictures, m k is the number of pictures contained in the kth level, the picture levels are divided by adopting a completeness discriminator according to a formula, the number of pictures contained in each level is sequentially reduced, and the number of pictures contained in the kth level is expressed as follows:
8. The method for real-time searching of hand-drawn images based on multi-granularity associative learning according to claim 1, wherein the step S4 comprises:
calculating the mean square loss of the picture x i in the ith grade and the picture x i+1 randomly selected each time in the ith grade according to the sequence of the strokes from less to more in the process of approaching the ith grade to the (i+1) th grade, sequentially adding the mean square loss of each picture in the ith grade and the mean square loss of the picture in the next grade to obtain the mean square error of the ith grade, approaching the ith grade to the (i+1) th grade, and expressing the mean square loss of the picture x i in the ith grade and the picture x i+1 in the next grade as:
MSE Loss=ω(xi+1-xi)2
Wherein ω >0.
9. The method for searching hand-drawn images in real time based on multi-granularity associative learning according to claim 1 or 2, wherein the expression of the triplet loss function is:
wherein m represents the number of pictures co-rendered by a complete sketch; v [i,j] represents the embedded vector of the ith picture in the sketch branch, and is obtained after passing through the dimension reduction layer; v p denotes the positive sample obtained after passing through the pre-training network and the attention layer, i.e., the embedded vector of the target image, v n denotes the negative sample obtained after passing through the pre-training network and the attention layer, i.e., the embedded vector of the image other than the target image in the image set, α is a constant, and d is the euclidean distance.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111241283.5A CN113886615B (en) | 2021-10-25 | 2021-10-25 | Hand-drawing image real-time retrieval method based on multi-granularity associative learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111241283.5A CN113886615B (en) | 2021-10-25 | 2021-10-25 | Hand-drawing image real-time retrieval method based on multi-granularity associative learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113886615A CN113886615A (en) | 2022-01-04 |
CN113886615B true CN113886615B (en) | 2024-06-04 |
Family
ID=79013911
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111241283.5A Active CN113886615B (en) | 2021-10-25 | 2021-10-25 | Hand-drawing image real-time retrieval method based on multi-granularity associative learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113886615B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114860980B (en) * | 2022-05-26 | 2024-07-19 | 重庆邮电大学 | Image retrieval method based on matching of sketch local features and global features |
CN115878833B (en) * | 2023-02-20 | 2023-06-13 | 中山大学 | Appearance patent image retrieval method and system based on hand-drawn sketch semantics |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107220277A (en) * | 2017-04-14 | 2017-09-29 | 西北大学 | Image retrieval algorithm based on cartographical sketching |
CN110580302A (en) * | 2019-08-13 | 2019-12-17 | 天津大学 | Sketch image retrieval method based on semi-heterogeneous joint embedded network |
CN110598018A (en) * | 2019-08-13 | 2019-12-20 | 天津大学 | Sketch image retrieval method based on cooperative attention |
CN111488474A (en) * | 2020-03-21 | 2020-08-04 | 复旦大学 | Fine-grained freehand sketch image retrieval method based on attention enhancement |
CN111625667A (en) * | 2020-05-18 | 2020-09-04 | 北京工商大学 | Three-dimensional model cross-domain retrieval method and system based on complex background image |
CN112085072A (en) * | 2020-08-24 | 2020-12-15 | 北方民族大学 | Cross-modal retrieval method of sketch retrieval three-dimensional model based on space-time characteristic information |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10430455B2 (en) * | 2017-06-09 | 2019-10-01 | Adobe Inc. | Sketch and style based image retrieval |
-
2021
- 2021-10-25 CN CN202111241283.5A patent/CN113886615B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107220277A (en) * | 2017-04-14 | 2017-09-29 | 西北大学 | Image retrieval algorithm based on cartographical sketching |
CN110580302A (en) * | 2019-08-13 | 2019-12-17 | 天津大学 | Sketch image retrieval method based on semi-heterogeneous joint embedded network |
CN110598018A (en) * | 2019-08-13 | 2019-12-20 | 天津大学 | Sketch image retrieval method based on cooperative attention |
CN111488474A (en) * | 2020-03-21 | 2020-08-04 | 复旦大学 | Fine-grained freehand sketch image retrieval method based on attention enhancement |
CN111625667A (en) * | 2020-05-18 | 2020-09-04 | 北京工商大学 | Three-dimensional model cross-domain retrieval method and system based on complex background image |
CN112085072A (en) * | 2020-08-24 | 2020-12-15 | 北方民族大学 | Cross-modal retrieval method of sketch retrieval three-dimensional model based on space-time characteristic information |
Non-Patent Citations (3)
Title |
---|
Multi-granularity Association Learning for On-the-fly Fine-grained Sketch-based Image Retrieval;DaweiDai等;《Knowledge-Based Systems》;20221011;1-17 * |
在线手绘草图识别中的用户建模方法;余淼;;电脑知识与技术;20080825(第S1期);64-66+109 * |
基于时序特征的草图识别方法;于美玉;吴昊;郭晓燕;贾棋;郭禾;;计算机科学;20181115(第S2期);208-212 * |
Also Published As
Publication number | Publication date |
---|---|
CN113886615A (en) | 2022-01-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111488474B (en) | Fine-grained freehand sketch image retrieval method based on attention enhancement | |
CN108427738B (en) | Rapid image retrieval method based on deep learning | |
CN113886615B (en) | Hand-drawing image real-time retrieval method based on multi-granularity associative learning | |
WO2022257578A1 (en) | Method for recognizing text, and apparatus | |
CN112800292B (en) | Cross-modal retrieval method based on modal specific and shared feature learning | |
CN111046275B (en) | User label determining method and device based on artificial intelligence and storage medium | |
CN111930894B (en) | Long text matching method and device, storage medium and electronic equipment | |
CN112925962B (en) | Hash coding-based cross-modal data retrieval method, system, device and medium | |
CN110929080B (en) | Optical remote sensing image retrieval method based on attention and generation countermeasure network | |
Watanabe et al. | A new pattern representation scheme using data compression | |
WO2021227091A1 (en) | Multi-modal classification method based on graph convolutional neural network | |
CN113297370B (en) | End-to-end multi-modal question-answering method and system based on multi-interaction attention | |
CN113177141B (en) | Multi-label video hash retrieval method and device based on semantic embedded soft similarity | |
CN109829065B (en) | Image retrieval method, device, equipment and computer readable storage medium | |
CN108446404B (en) | Search method and system for unconstrained visual question-answer pointing problem | |
CN114298122B (en) | Data classification method, apparatus, device, storage medium and computer program product | |
CN105493078A (en) | Color sketch image searching | |
CN113435461B (en) | Point cloud local feature extraction method, device, equipment and storage medium | |
CN113761153A (en) | Question and answer processing method and device based on picture, readable medium and electronic equipment | |
CN115131698A (en) | Video attribute determination method, device, equipment and storage medium | |
CN109918162B (en) | High-dimensional graph interactive display method for learnable mass information | |
CN109472282A (en) | A kind of depth image hash method based on few training sample | |
CN109255377A (en) | Instrument recognition methods, device, electronic equipment and storage medium | |
CN116521913A (en) | Sketch three-dimensional model retrieval method based on prototype comparison learning | |
CN113516118B (en) | Multi-mode cultural resource processing method for joint embedding of images and texts |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |