CN110532409A - Image search method based on isomery bilinearity attention network - Google Patents
Image search method based on isomery bilinearity attention network Download PDFInfo
- Publication number
- CN110532409A CN110532409A CN201910692241.XA CN201910692241A CN110532409A CN 110532409 A CN110532409 A CN 110532409A CN 201910692241 A CN201910692241 A CN 201910692241A CN 110532409 A CN110532409 A CN 110532409A
- Authority
- CN
- China
- Prior art keywords
- bilinearity
- network
- image
- branch
- characterization
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/53—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of image search method based on isomery bilinearity attention network, there are two special branch, an offer critical zone locations information for the network, and another then provides the description of property level.The output result of Liang Ge branch passes through the module integrated characterization at an image level of a bilinearity based on attention mechanism.Come the two branches of pre-training using two complementary tasks to ensure that the two branches have the ability to realize key area positioning and attribute description.First branch realizes the task of key area detection using hour-glass network;Second branch realizes attribute forecast using Inception-Resnet-v2 network.The attention mechanism for the channel-wise that the present invention is driven jointly using two branches is weighted the channel of the output characterization of two branches, then the characterization after weighting is integrated into the characterization of final image level using the method in compact bilinearity pond.Then the Euclidean distance between the characterization by calculating different images and sequence, to obtain final search result.
Description
Technical field
The invention belongs to content-based image retrieval field, specially the channel attention mechanism of optimization heterogeneous characteristic and
To the image search method and system in the compact bilinearity pond of the two isomery branch feature interactions modeling after optimization.
Background technique
Content-based image retrieval can effectively help user to browse and find from a large amount of image data base
The image that oneself is needed.The research interest that it has very big commercial value therefore to cause many people in recent years.However it is right
In every image of database, they are often under different illumination conditions, different shooting angle and mixed and disorderly background
It is collected.In addition to this, the otherness of many images also tends to embody in detail, such as the neckline pattern of dress
Just there are many kinds of formats: crew neck, V-arrangement neck, bateau neck etc..These phenomenons just bring great challenge to the work of image retrieval
Property.These challenges can be summarized as two problems: " where see " and " how describing "." where see " mainly solves how to find out one
The key position of a object.One image usually contains multiple key components of retrieval object, and people can be by comparing these
The visual appearance of key component distinguishes two images." how describing " seeks to the vision content of description image, makes to retrieve
System gets rid of the influence of the factors such as illumination, background, posture and visual angle, to focus more on the attribute aspect of searched targets.
Summary of the invention
Technical problems to be solved
In order to avoid the shortcomings of the prior art, the present invention proposes one kind based on isomery bilinearity attention network frame
Image search method.
Technical solution
A kind of image search method based on isomery bilinearity attention network, it is characterised in that steps are as follows:
Step 1: a picture is obtained to a characteristic pattern after hour-glass networkSimultaneously
Another characteristic pattern is obtained by Inception-Resnet-v2 network
Step 2: ensemble average pond being made to the two characteristic patterns respectively, obtains two vectorsWith
va=GlobalAveragePooling (Va), (1)
vl=GlobalAveragePooling (Vl). (2)
Step 3: by vaAnd vlIt is spliced into a vector, then by two parallel multilayer perceptrons, to calculate every
The attention weight of the channel-wise of the characteristic pattern of a branch;Specific formula for calculation is as follows:
HereIt is all linear turn
Change matrix;kaAnd klIt is projection dimension,For concatenation so C=Ca+Cl, αaTo distribute to attributive classification branch
Channel-wise attention weight, αlTo distribute to the channel-wise attention weight that key area positions branch;
Sigmoid and Relu is common activation primitive;
Step 4: obtaining the characteristic pattern of two weightingsThen they are adopted again
Sample is to identical space size W × H;
Step 5: if the feature vector on position characteristic pattern (i, j) that given step 4 obtainsIt uses
Count sketch function Ψ is xijProject object vectorHere a symbolic vector has also been usedWith
One map vectorEach of s value is all randomly selected with identical probability from {+1, -1 };It is every in P
A value is then therefrom to be chosen from { 1 ..., d } with equally distributed probability;count
Sketch function Ψ is defined as follows:
yij=Ψ (xij, s, p) and=[v1,…,vd], (5)
Here vt=∑ls[l]·xij[l], so that p [l]=t;If two vectors (With) convolution make
For the input of count sketch function, then this count sketch function can be write as independent vector is as input
The convolution of two count sketch functions:
Here ⊙ indicates apposition operation, and * indicates convolution operation;Two-wire is obtained eventually by the conversion of time domain and frequency domain
Property feature:
It wherein, ° is the multiplication of element set;
Step 6: in training, finally obtained bilinearity feature being made ID classification based training;In test, query graph is calculated
Picture and the bilinearity feature after the obtained regularization of image in database, then calculate the Euclidean distance between them, i.e.,
The result of final top-k can be obtained.
Beneficial effect
A kind of image search method based on isomery bilinearity attention network proposed by the present invention, obtains one of image
The bilinearity feature of robust, for content-based image retrieval task, which not only addresses only " where see "
The problem of, that is, the key area in image is found out, and solve the problems, such as " how to describe " to key area, given each
The attributive character of key area.
Specific embodiment
The technical scheme is that such: there are two special branch, an offer critical zone locations for the network
Information, and another then provides the description of property level.The output result of Liang Ge branch is double based on attention mechanism by one
Linear block is integrated into the characterization of an image level.Come the two branches of pre-training using two complementary tasks to ensure this
Liang Ge branch has the ability to realize key area positioning and attribute description.First branch realizes pass using hour-glass network
The task of key range detection;Second branch realizes attribute forecast using Inception-Resnet-v2 network.The present invention makes
The attention mechanism of the channel-wise driven jointly with two branches is weighted the channel of the output characterization of two branches, so
Characterization after weighting is integrated into the characterization of final image level using the method in compact bilinearity pond afterwards.Then pass through meter
The Euclidean distance between the characterization of different images and sequence are calculated, to obtain final search result.
Detailed process is as follows:
1, attributive classification branch pre-training
Dimension of picture, is adjusted to 299 × 299 sizes using the method for bilinear interpolation by a given picture.Adjustment
Picture afterwards is input to Inception-Resnet-v2 network, removes Inception-Resnet-v2 network i.e. last two layers
Average pond layer and full articulamentum obtain the characteristic pattern of network output, and characteristic pattern size is 1536 × 8 × 8, the weight behind network
Average pond layer and full articulamentum are newly increased, the dimension needs of the new full articulamentum output unlike former network are classified
Attribute number.In order to solve the problems, such as data nonbalance, the present invention is selected the comparable attribute of quantity to predict, is handed over using binary system
Entropy loss function is pitched to assess the performance of multi-tag attribute forecast task, using stochastic gradient descent method progress parameter optimization and more
Newly.
2, key area positions branch pre-training
Dimension of picture, is adjusted to 256 × 256 sizes using the method for bilinear interpolation by a given picture.Adjustment
Picture afterwards is input to hour-glass network, and landmark number is set as 8, that is, exports the coordinate of 8 key points, allows rear basis
The coordinate of this 8 key points come generate 64 × 64 thermodynamic chart (heatmap), then heating power corresponding with ground truth
Scheme (64 × 64) and calculates normalization mean error.Carry out undated parameter using Adam optimizer when training.
3, data enhance
Same picture at random left and right overturning simultaneously, it is random rotate by a certain angle θ ∈ [- 30 ° ,+30 °] simultaneously, use
The method of bilinear interpolation is adjusted separately to 299 × 299 and 256 × 256 sizes, and last normalized obtains two tensors
(299 × 299 × 3 and 256 × 256 × 3).Because of input of the tensor 256 × 256 × 3 as key area positioning branch,
The coordinate for corresponding to key point for original image will also change.Image is when making left and right overturning, the seat of image left side point
Mark is changed to the coordinate of corresponding the right point, and the coordinate of the right point is changed to the coordinate of corresponding left side point.Random-Rotation and figure
The coordinate of key point will also adjust accordingly when as size adjustment.
4, the branch feature of image is obtained
The tensor (299 × 299 × 3) obtained after data prediction is input to Inception-Resnet-v2 network to obtain
To characteristic patternAnother tensor (256 × 256 × 3) of identical image is input to hour-glass net
Network obtains another characteristic pattern
5, the characteristic optimization based on channel-wise attention mechanism
The characteristic pattern of two branches output passes through ensemble average Chi Huahou, respectively obtains two Global Vectors
WithThen the two vectors are stitched togetherThen the vector of this splicing is passed through
The hidden layer that one 512 dimension is obtained after one full articulamentum and Relu layers, obtains using a full articulamentum and Sigmoid layers
The weight vectors in the channel of characteristic patternWithThe channel distribution weight of two characteristic patterns will be weighed
The weighted value of each of weight vector is multiplied with corresponding channel, the feature after being optimized With
6, the compact bilinearity pond of bicharacteristic
First by the feature after the optimization of two branchesWithIt is adjusted to identical space size (8 × 8) by average Chi Huahou,
Take the vector on position (i, j) of two branch characteristic patternsWithThe two vectors are passed through respectively
Count sketch function projects on the vector of d dimension, it is demonstrated experimentally that effect is preferable when d takes 16k.Detailed process is exactly:
ForEstablish two vectorsWithVectorEach of value be all from {+1, -1 }
Initialization is selected at random with identical probability, andEach of value be then from { 1 ..., 16k } with identical probability with
Machine selects initialization.WithInput as count sketch function.In count sketch function,
First initialize a vector y1=[0 ..., 0]16k, then for y1In i-th dimension numerical value can be obtained by formula below:So thatFinally obtain a projection result y1.Similarly for input
WithCount sketch function exports a projection result y2.Finally to y1And y2Carry out time domain and frequency domain conversion, obtain to
Measure Fij=FFT-1(FFT(y1)°FFT(y2)).By 8 × 8 FijIt is integrated into a tensorThen F is asked
Each channel in F is summed to obtain a vector with pondization processingThen this vector f is melted as symbol
Square root and L2Norm processing, finally obtains the bilinearity feature of picture.Then dimensionality reduction is carried out to this bilinearity feature, double
Linear character is respectively by obtaining final compact bilinearity feature after full articulamentum, batch normalization layer.
7, model training
Using compact bilinearity feature as input, a full articulamentum realizes ID classification task as frame.One
ID is comprising all positive sample pictures (comprising same object i.e. in picture), and for an ID, other ID are negative samples
This.Each example is namely regarded as an individual class.The dimension of full articulamentum output is equal to all ID numbers.Mainly
Loss function uses cross entropy loss function:
Here x is predicted vector, and gt is the corresponding index of true tag.There are two complementary loss functions also
It is binary system cross entropy loss function and the key area positioning of the multi-tag attribute forecast training of attributive classification branch road
The normalization mean error function of branch road critical point detection.Three losses distribute different weights, obtain total loss.Optimization
Device selects Adam optimizer uniformly to calculate gradient and carry out backpropagation.It needs that learning rate is arranged when undated parameter, it is initial to learn
Habit rate is set as 0.0001, and then every 5 epoch, learning rate just decay to original half.The picture number of an iteration is arranged
For 20 pictures.It loses and tends to be steady after 35 epoch.In order to avoid training over-fitting, one added about on loss item
Beam item L2Standardization.
8, model application
Image data processing does not need data enhancing herein, it is only necessary to Image Adjusting to 299 × 299 and 256 ×
256 sizes, and normalize the input that can be used as attributive classification branch and key area positioning branch.Entire network model
Parameter it is all fixed, as long as input image data and forward reasoning.The compact bilinearity that model is finally obtained
These vectors are passed through L by feature of the feature as image, the in this way feature vector of our available all images2At norm
These feature vectors may map on a spherical surface after reason, and feature vector can serve as the foundation of measurement.Provide one
Query image obtains the feature vector F of query image after model reasoningq, database images obtain after model reasoning
All feature vector { F of database images1,…,Fm, calculate the feature vector F of query imageqWith all feature vectors
{F1,…,FmEuclidean distance: di=| | Fq-Fi||2, i=1 ..., m obtain D=[d1,…,dm], all values in D are carried out
It resequences from high in the end, top-k is exactly to take the k of foremost a as a result, corresponding database images are considered as retrieving
Correct result.If this k number of model prediction is according to there is and retrieve the true corresponding database images of image in the image of library
When, that is, think that this is retrieved successfully.For example the result of top-5 is [d10,d35,d60,d61,d26], if query image will be retrieved
Database images be database in No. 61 image, then this is retrieved successfully.
Claims (1)
1. a kind of image search method based on isomery bilinearity attention network, it is characterised in that steps are as follows:
Step 1: a picture is obtained to a characteristic pattern after hour-glass networkPass through simultaneously
Inception-Resnet-v2 network obtains another characteristic pattern
Step 2: ensemble average pond being made to the two characteristic patterns respectively, obtains two vectorsWith
va=GlobalAveragePooling (Va), (1)
vl=GlobalAveragePooling (Vl). (2)
Step 3: by vaAnd vlIt is spliced into a vector, then by two parallel multilayer perceptrons, to calculate each
The attention weight of the channel-wise of the characteristic pattern on road;Specific formula for calculation is as follows:
HereIt is all linear transformation square
Battle array;kaAnd klIt is projection dimension,For concatenation so C=Ca+Cl, αaFor the channel- for distributing to attributive classification branch
Wise attention weight, αlTo distribute to the channel-wise attention weight that key area positions branch;Sigmoid and
Relu is common activation primitive;
Step 4: obtaining the characteristic pattern of two weightingsThen they are re-sampled to
Identical space size W × H;
Step 5: if the feature vector on position characteristic pattern (i, j) that given step 4 obtainsUse count
Sketch function Ψ is xijProject object vectorHere a symbolic vector has also been usedIt is reflected with one
Directive amountEach of s value is all randomly selected with identical probability from {+1, -1 };Each value in P is then
It is therefrom to be chosen from { 1 ..., d } with equally distributed probability;Count sketch function Ψ is defined as follows:
yij=Ψ (xij, s, p) and=[v1..., vd], (5)
Here vt=∑ls[l]·xij[l], so that p [l]=t;If two vectorsWithConvolution as count
The input of sketch function, then this count sketch function can be write as independent vector two as input
The convolution of count sketch function:
Here ⊙ indicates apposition operation, and * indicates convolution operation;Bilinearity spy is obtained eventually by the conversion of time domain and frequency domain
Sign:
Wherein,For the multiplication of element set;
Step 6: in training, finally obtained bilinearity feature being made ID classification based training;Test when, calculate query image with
The bilinearity feature after the obtained regularization of image in database, then calculates the Euclidean distance between them, can obtain
To the result of final top-k.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910692241.XA CN110532409B (en) | 2019-07-30 | 2019-07-30 | Image retrieval method based on heterogeneous bilinear attention network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910692241.XA CN110532409B (en) | 2019-07-30 | 2019-07-30 | Image retrieval method based on heterogeneous bilinear attention network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110532409A true CN110532409A (en) | 2019-12-03 |
CN110532409B CN110532409B (en) | 2022-09-27 |
Family
ID=68661312
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910692241.XA Active CN110532409B (en) | 2019-07-30 | 2019-07-30 | Image retrieval method based on heterogeneous bilinear attention network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110532409B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111640103A (en) * | 2020-05-29 | 2020-09-08 | 北京百度网讯科技有限公司 | Image detection method, device, equipment and storage medium |
CN113011362A (en) * | 2021-03-29 | 2021-06-22 | 吉林大学 | Fine-grained fundus image grading algorithm based on bilinear pooling and attention mechanism |
CN115754108A (en) * | 2022-11-23 | 2023-03-07 | 福建省杭氟电子材料有限公司 | Acidity measuring system and method for electronic-grade hexafluorobutadiene |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107291945A (en) * | 2017-07-12 | 2017-10-24 | 上海交通大学 | The high-precision image of clothing search method and system of view-based access control model attention model |
US20170308770A1 (en) * | 2016-04-26 | 2017-10-26 | Xerox Corporation | End-to-end saliency mapping via probability distribution prediction |
CN109117437A (en) * | 2017-06-23 | 2019-01-01 | 李峰 | A kind of image feature extraction method towards image of clothing retrieval |
-
2019
- 2019-07-30 CN CN201910692241.XA patent/CN110532409B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170308770A1 (en) * | 2016-04-26 | 2017-10-26 | Xerox Corporation | End-to-end saliency mapping via probability distribution prediction |
CN109117437A (en) * | 2017-06-23 | 2019-01-01 | 李峰 | A kind of image feature extraction method towards image of clothing retrieval |
CN107291945A (en) * | 2017-07-12 | 2017-10-24 | 上海交通大学 | The high-precision image of clothing search method and system of view-based access control model attention model |
Non-Patent Citations (2)
Title |
---|
JIANKANG DENG等: "Cascade Multi-View Hourglass Model for Robust 3D Face Alignment", 《2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2018)》 * |
周义凯等: "基于CNN的人体姿态识别", 《计算机与现代化》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111640103A (en) * | 2020-05-29 | 2020-09-08 | 北京百度网讯科技有限公司 | Image detection method, device, equipment and storage medium |
CN113011362A (en) * | 2021-03-29 | 2021-06-22 | 吉林大学 | Fine-grained fundus image grading algorithm based on bilinear pooling and attention mechanism |
CN115754108A (en) * | 2022-11-23 | 2023-03-07 | 福建省杭氟电子材料有限公司 | Acidity measuring system and method for electronic-grade hexafluorobutadiene |
Also Published As
Publication number | Publication date |
---|---|
CN110532409B (en) | 2022-09-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Shang et al. | SAR targets classification based on deep memory convolution neural networks and transfer parameters | |
JP6600009B2 (en) | Fine-grained image classification by investigation of bipartite graph labels | |
CN107480261B (en) | Fine-grained face image fast retrieval method based on deep learning | |
Li et al. | A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries | |
US8903198B2 (en) | Image ranking based on attribute correlation | |
Gao et al. | Convolution Neural Network Based on Two‐Dimensional Spectrum for Hyperspectral Image Classification | |
CN110532409A (en) | Image search method based on isomery bilinearity attention network | |
Xu | Multiple-instance learning based decision neural networks for image retrieval and classification | |
Wright et al. | Artfid: Quantitative evaluation of neural style transfer | |
Wu et al. | A novel ship classification approach for high resolution SAR images based on the BDA-KELM classification model | |
Huang et al. | Extreme learning machine with multi-scale local receptive fields for texture classification | |
Cheng et al. | Hierarchical attributes learning for pedestrian re-identification via parallel stochastic gradient descent combined with momentum correction and adaptive learning rate | |
CN111126249A (en) | Pedestrian re-identification method and device combining big data and Bayes | |
CN115222896B (en) | Three-dimensional reconstruction method, three-dimensional reconstruction device, electronic equipment and computer readable storage medium | |
CN116108267A (en) | Recommendation method and related equipment | |
Huang et al. | View-based weight network for 3D object recognition | |
Sikdar et al. | Scale-invariant batch-adaptive residual learning for person re-identification | |
Xiong et al. | Person re-identification with multiple similarity probabilities using deep metric learning for efficient smart security applications | |
Shi et al. | Anchor-based self-ensembling for semi-supervised deep pairwise hashing | |
Sima et al. | Composite kernel of mutual learning on mid-level features for hyperspectral image classification | |
CN114693923A (en) | Three-dimensional point cloud semantic segmentation method based on context and attention | |
Xia et al. | Clothing classification using transfer learning with squeeze and excitation block | |
Du et al. | POLAR++: active one-shot personalized article recommendation | |
Zhou et al. | Clothing image classification with DenseNet201 network and optimized regularized random vector functional link | |
Krasilenko et al. | Experimental research of methods for clustering and selecting image fragments using spatial invariant equivalent models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |