CN110532409A - Image search method based on isomery bilinearity attention network - Google Patents

Image search method based on isomery bilinearity attention network Download PDF

Info

Publication number
CN110532409A
CN110532409A CN201910692241.XA CN201910692241A CN110532409A CN 110532409 A CN110532409 A CN 110532409A CN 201910692241 A CN201910692241 A CN 201910692241A CN 110532409 A CN110532409 A CN 110532409A
Authority
CN
China
Prior art keywords
bilinearity
network
image
branch
characterization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910692241.XA
Other languages
Chinese (zh)
Other versions
CN110532409B (en
Inventor
王鹏
苏海波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwest University of Technology
Original Assignee
Northwest University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwest University of Technology filed Critical Northwest University of Technology
Priority to CN201910692241.XA priority Critical patent/CN110532409B/en
Publication of CN110532409A publication Critical patent/CN110532409A/en
Application granted granted Critical
Publication of CN110532409B publication Critical patent/CN110532409B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of image search method based on isomery bilinearity attention network, there are two special branch, an offer critical zone locations information for the network, and another then provides the description of property level.The output result of Liang Ge branch passes through the module integrated characterization at an image level of a bilinearity based on attention mechanism.Come the two branches of pre-training using two complementary tasks to ensure that the two branches have the ability to realize key area positioning and attribute description.First branch realizes the task of key area detection using hour-glass network;Second branch realizes attribute forecast using Inception-Resnet-v2 network.The attention mechanism for the channel-wise that the present invention is driven jointly using two branches is weighted the channel of the output characterization of two branches, then the characterization after weighting is integrated into the characterization of final image level using the method in compact bilinearity pond.Then the Euclidean distance between the characterization by calculating different images and sequence, to obtain final search result.

Description

Image search method based on isomery bilinearity attention network
Technical field
The invention belongs to content-based image retrieval field, specially the channel attention mechanism of optimization heterogeneous characteristic and To the image search method and system in the compact bilinearity pond of the two isomery branch feature interactions modeling after optimization.
Background technique
Content-based image retrieval can effectively help user to browse and find from a large amount of image data base The image that oneself is needed.The research interest that it has very big commercial value therefore to cause many people in recent years.However it is right In every image of database, they are often under different illumination conditions, different shooting angle and mixed and disorderly background It is collected.In addition to this, the otherness of many images also tends to embody in detail, such as the neckline pattern of dress Just there are many kinds of formats: crew neck, V-arrangement neck, bateau neck etc..These phenomenons just bring great challenge to the work of image retrieval Property.These challenges can be summarized as two problems: " where see " and " how describing "." where see " mainly solves how to find out one The key position of a object.One image usually contains multiple key components of retrieval object, and people can be by comparing these The visual appearance of key component distinguishes two images." how describing " seeks to the vision content of description image, makes to retrieve System gets rid of the influence of the factors such as illumination, background, posture and visual angle, to focus more on the attribute aspect of searched targets.
Summary of the invention
Technical problems to be solved
In order to avoid the shortcomings of the prior art, the present invention proposes one kind based on isomery bilinearity attention network frame Image search method.
Technical solution
A kind of image search method based on isomery bilinearity attention network, it is characterised in that steps are as follows:
Step 1: a picture is obtained to a characteristic pattern after hour-glass networkSimultaneously Another characteristic pattern is obtained by Inception-Resnet-v2 network
Step 2: ensemble average pond being made to the two characteristic patterns respectively, obtains two vectorsWith
va=GlobalAveragePooling (Va), (1)
vl=GlobalAveragePooling (Vl). (2)
Step 3: by vaAnd vlIt is spliced into a vector, then by two parallel multilayer perceptrons, to calculate every The attention weight of the channel-wise of the characteristic pattern of a branch;Specific formula for calculation is as follows:
HereIt is all linear turn Change matrix;kaAnd klIt is projection dimension,For concatenation so C=Ca+Cl, αaTo distribute to attributive classification branch Channel-wise attention weight, αlTo distribute to the channel-wise attention weight that key area positions branch; Sigmoid and Relu is common activation primitive;
Step 4: obtaining the characteristic pattern of two weightingsThen they are adopted again Sample is to identical space size W × H;
Step 5: if the feature vector on position characteristic pattern (i, j) that given step 4 obtainsIt uses Count sketch function Ψ is xijProject object vectorHere a symbolic vector has also been usedWith One map vectorEach of s value is all randomly selected with identical probability from {+1, -1 };It is every in P A value is then therefrom to be chosen from { 1 ..., d } with equally distributed probability;count
Sketch function Ψ is defined as follows:
yij=Ψ (xij, s, p) and=[v1,…,vd], (5)
Here vt=∑ls[l]·xij[l], so that p [l]=t;If two vectors (With) convolution make For the input of count sketch function, then this count sketch function can be write as independent vector is as input The convolution of two count sketch functions:
Here ⊙ indicates apposition operation, and * indicates convolution operation;Two-wire is obtained eventually by the conversion of time domain and frequency domain Property feature:
It wherein, ° is the multiplication of element set;
Step 6: in training, finally obtained bilinearity feature being made ID classification based training;In test, query graph is calculated Picture and the bilinearity feature after the obtained regularization of image in database, then calculate the Euclidean distance between them, i.e., The result of final top-k can be obtained.
Beneficial effect
A kind of image search method based on isomery bilinearity attention network proposed by the present invention, obtains one of image The bilinearity feature of robust, for content-based image retrieval task, which not only addresses only " where see " The problem of, that is, the key area in image is found out, and solve the problems, such as " how to describe " to key area, given each The attributive character of key area.
Specific embodiment
The technical scheme is that such: there are two special branch, an offer critical zone locations for the network Information, and another then provides the description of property level.The output result of Liang Ge branch is double based on attention mechanism by one Linear block is integrated into the characterization of an image level.Come the two branches of pre-training using two complementary tasks to ensure this Liang Ge branch has the ability to realize key area positioning and attribute description.First branch realizes pass using hour-glass network The task of key range detection;Second branch realizes attribute forecast using Inception-Resnet-v2 network.The present invention makes The attention mechanism of the channel-wise driven jointly with two branches is weighted the channel of the output characterization of two branches, so Characterization after weighting is integrated into the characterization of final image level using the method in compact bilinearity pond afterwards.Then pass through meter The Euclidean distance between the characterization of different images and sequence are calculated, to obtain final search result.
Detailed process is as follows:
1, attributive classification branch pre-training
Dimension of picture, is adjusted to 299 × 299 sizes using the method for bilinear interpolation by a given picture.Adjustment Picture afterwards is input to Inception-Resnet-v2 network, removes Inception-Resnet-v2 network i.e. last two layers Average pond layer and full articulamentum obtain the characteristic pattern of network output, and characteristic pattern size is 1536 × 8 × 8, the weight behind network Average pond layer and full articulamentum are newly increased, the dimension needs of the new full articulamentum output unlike former network are classified Attribute number.In order to solve the problems, such as data nonbalance, the present invention is selected the comparable attribute of quantity to predict, is handed over using binary system Entropy loss function is pitched to assess the performance of multi-tag attribute forecast task, using stochastic gradient descent method progress parameter optimization and more Newly.
2, key area positions branch pre-training
Dimension of picture, is adjusted to 256 × 256 sizes using the method for bilinear interpolation by a given picture.Adjustment Picture afterwards is input to hour-glass network, and landmark number is set as 8, that is, exports the coordinate of 8 key points, allows rear basis The coordinate of this 8 key points come generate 64 × 64 thermodynamic chart (heatmap), then heating power corresponding with ground truth Scheme (64 × 64) and calculates normalization mean error.Carry out undated parameter using Adam optimizer when training.
3, data enhance
Same picture at random left and right overturning simultaneously, it is random rotate by a certain angle θ ∈ [- 30 ° ,+30 °] simultaneously, use The method of bilinear interpolation is adjusted separately to 299 × 299 and 256 × 256 sizes, and last normalized obtains two tensors (299 × 299 × 3 and 256 × 256 × 3).Because of input of the tensor 256 × 256 × 3 as key area positioning branch, The coordinate for corresponding to key point for original image will also change.Image is when making left and right overturning, the seat of image left side point Mark is changed to the coordinate of corresponding the right point, and the coordinate of the right point is changed to the coordinate of corresponding left side point.Random-Rotation and figure The coordinate of key point will also adjust accordingly when as size adjustment.
4, the branch feature of image is obtained
The tensor (299 × 299 × 3) obtained after data prediction is input to Inception-Resnet-v2 network to obtain To characteristic patternAnother tensor (256 × 256 × 3) of identical image is input to hour-glass net Network obtains another characteristic pattern
5, the characteristic optimization based on channel-wise attention mechanism
The characteristic pattern of two branches output passes through ensemble average Chi Huahou, respectively obtains two Global Vectors WithThen the two vectors are stitched togetherThen the vector of this splicing is passed through The hidden layer that one 512 dimension is obtained after one full articulamentum and Relu layers, obtains using a full articulamentum and Sigmoid layers The weight vectors in the channel of characteristic patternWithThe channel distribution weight of two characteristic patterns will be weighed The weighted value of each of weight vector is multiplied with corresponding channel, the feature after being optimized With
6, the compact bilinearity pond of bicharacteristic
First by the feature after the optimization of two branchesWithIt is adjusted to identical space size (8 × 8) by average Chi Huahou, Take the vector on position (i, j) of two branch characteristic patternsWithThe two vectors are passed through respectively Count sketch function projects on the vector of d dimension, it is demonstrated experimentally that effect is preferable when d takes 16k.Detailed process is exactly: ForEstablish two vectorsWithVectorEach of value be all from {+1, -1 } Initialization is selected at random with identical probability, andEach of value be then from { 1 ..., 16k } with identical probability with Machine selects initialization.WithInput as count sketch function.In count sketch function, First initialize a vector y1=[0 ..., 0]16k, then for y1In i-th dimension numerical value can be obtained by formula below:So thatFinally obtain a projection result y1.Similarly for input WithCount sketch function exports a projection result y2.Finally to y1And y2Carry out time domain and frequency domain conversion, obtain to Measure Fij=FFT-1(FFT(y1)°FFT(y2)).By 8 × 8 FijIt is integrated into a tensorThen F is asked Each channel in F is summed to obtain a vector with pondization processingThen this vector f is melted as symbol Square root and L2Norm processing, finally obtains the bilinearity feature of picture.Then dimensionality reduction is carried out to this bilinearity feature, double Linear character is respectively by obtaining final compact bilinearity feature after full articulamentum, batch normalization layer.
7, model training
Using compact bilinearity feature as input, a full articulamentum realizes ID classification task as frame.One ID is comprising all positive sample pictures (comprising same object i.e. in picture), and for an ID, other ID are negative samples This.Each example is namely regarded as an individual class.The dimension of full articulamentum output is equal to all ID numbers.Mainly Loss function uses cross entropy loss function:
Here x is predicted vector, and gt is the corresponding index of true tag.There are two complementary loss functions also It is binary system cross entropy loss function and the key area positioning of the multi-tag attribute forecast training of attributive classification branch road The normalization mean error function of branch road critical point detection.Three losses distribute different weights, obtain total loss.Optimization Device selects Adam optimizer uniformly to calculate gradient and carry out backpropagation.It needs that learning rate is arranged when undated parameter, it is initial to learn Habit rate is set as 0.0001, and then every 5 epoch, learning rate just decay to original half.The picture number of an iteration is arranged For 20 pictures.It loses and tends to be steady after 35 epoch.In order to avoid training over-fitting, one added about on loss item Beam item L2Standardization.
8, model application
Image data processing does not need data enhancing herein, it is only necessary to Image Adjusting to 299 × 299 and 256 × 256 sizes, and normalize the input that can be used as attributive classification branch and key area positioning branch.Entire network model Parameter it is all fixed, as long as input image data and forward reasoning.The compact bilinearity that model is finally obtained These vectors are passed through L by feature of the feature as image, the in this way feature vector of our available all images2At norm These feature vectors may map on a spherical surface after reason, and feature vector can serve as the foundation of measurement.Provide one Query image obtains the feature vector F of query image after model reasoningq, database images obtain after model reasoning All feature vector { F of database images1,…,Fm, calculate the feature vector F of query imageqWith all feature vectors {F1,…,FmEuclidean distance: di=| | Fq-Fi||2, i=1 ..., m obtain D=[d1,…,dm], all values in D are carried out It resequences from high in the end, top-k is exactly to take the k of foremost a as a result, corresponding database images are considered as retrieving Correct result.If this k number of model prediction is according to there is and retrieve the true corresponding database images of image in the image of library When, that is, think that this is retrieved successfully.For example the result of top-5 is [d10,d35,d60,d61,d26], if query image will be retrieved Database images be database in No. 61 image, then this is retrieved successfully.

Claims (1)

1. a kind of image search method based on isomery bilinearity attention network, it is characterised in that steps are as follows:
Step 1: a picture is obtained to a characteristic pattern after hour-glass networkPass through simultaneously Inception-Resnet-v2 network obtains another characteristic pattern
Step 2: ensemble average pond being made to the two characteristic patterns respectively, obtains two vectorsWith
va=GlobalAveragePooling (Va), (1)
vl=GlobalAveragePooling (Vl). (2)
Step 3: by vaAnd vlIt is spliced into a vector, then by two parallel multilayer perceptrons, to calculate each The attention weight of the channel-wise of the characteristic pattern on road;Specific formula for calculation is as follows:
HereIt is all linear transformation square Battle array;kaAnd klIt is projection dimension,For concatenation so C=Ca+Cl, αaFor the channel- for distributing to attributive classification branch Wise attention weight, αlTo distribute to the channel-wise attention weight that key area positions branch;Sigmoid and Relu is common activation primitive;
Step 4: obtaining the characteristic pattern of two weightingsThen they are re-sampled to Identical space size W × H;
Step 5: if the feature vector on position characteristic pattern (i, j) that given step 4 obtainsUse count Sketch function Ψ is xijProject object vectorHere a symbolic vector has also been usedIt is reflected with one Directive amountEach of s value is all randomly selected with identical probability from {+1, -1 };Each value in P is then It is therefrom to be chosen from { 1 ..., d } with equally distributed probability;Count sketch function Ψ is defined as follows:
yij=Ψ (xij, s, p) and=[v1..., vd], (5)
Here vt=∑ls[l]·xij[l], so that p [l]=t;If two vectorsWithConvolution as count The input of sketch function, then this count sketch function can be write as independent vector two as input The convolution of count sketch function:
Here ⊙ indicates apposition operation, and * indicates convolution operation;Bilinearity spy is obtained eventually by the conversion of time domain and frequency domain Sign:
Wherein,For the multiplication of element set;
Step 6: in training, finally obtained bilinearity feature being made ID classification based training;Test when, calculate query image with The bilinearity feature after the obtained regularization of image in database, then calculates the Euclidean distance between them, can obtain To the result of final top-k.
CN201910692241.XA 2019-07-30 2019-07-30 Image retrieval method based on heterogeneous bilinear attention network Active CN110532409B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910692241.XA CN110532409B (en) 2019-07-30 2019-07-30 Image retrieval method based on heterogeneous bilinear attention network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910692241.XA CN110532409B (en) 2019-07-30 2019-07-30 Image retrieval method based on heterogeneous bilinear attention network

Publications (2)

Publication Number Publication Date
CN110532409A true CN110532409A (en) 2019-12-03
CN110532409B CN110532409B (en) 2022-09-27

Family

ID=68661312

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910692241.XA Active CN110532409B (en) 2019-07-30 2019-07-30 Image retrieval method based on heterogeneous bilinear attention network

Country Status (1)

Country Link
CN (1) CN110532409B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111640103A (en) * 2020-05-29 2020-09-08 北京百度网讯科技有限公司 Image detection method, device, equipment and storage medium
CN113011362A (en) * 2021-03-29 2021-06-22 吉林大学 Fine-grained fundus image grading algorithm based on bilinear pooling and attention mechanism
CN115754108A (en) * 2022-11-23 2023-03-07 福建省杭氟电子材料有限公司 Acidity measuring system and method for electronic-grade hexafluorobutadiene

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107291945A (en) * 2017-07-12 2017-10-24 上海交通大学 The high-precision image of clothing search method and system of view-based access control model attention model
US20170308770A1 (en) * 2016-04-26 2017-10-26 Xerox Corporation End-to-end saliency mapping via probability distribution prediction
CN109117437A (en) * 2017-06-23 2019-01-01 李峰 A kind of image feature extraction method towards image of clothing retrieval

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170308770A1 (en) * 2016-04-26 2017-10-26 Xerox Corporation End-to-end saliency mapping via probability distribution prediction
CN109117437A (en) * 2017-06-23 2019-01-01 李峰 A kind of image feature extraction method towards image of clothing retrieval
CN107291945A (en) * 2017-07-12 2017-10-24 上海交通大学 The high-precision image of clothing search method and system of view-based access control model attention model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIANKANG DENG等: "Cascade Multi-View Hourglass Model for Robust 3D Face Alignment", 《2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2018)》 *
周义凯等: "基于CNN的人体姿态识别", 《计算机与现代化》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111640103A (en) * 2020-05-29 2020-09-08 北京百度网讯科技有限公司 Image detection method, device, equipment and storage medium
CN113011362A (en) * 2021-03-29 2021-06-22 吉林大学 Fine-grained fundus image grading algorithm based on bilinear pooling and attention mechanism
CN115754108A (en) * 2022-11-23 2023-03-07 福建省杭氟电子材料有限公司 Acidity measuring system and method for electronic-grade hexafluorobutadiene

Also Published As

Publication number Publication date
CN110532409B (en) 2022-09-27

Similar Documents

Publication Publication Date Title
Shang et al. SAR targets classification based on deep memory convolution neural networks and transfer parameters
JP6600009B2 (en) Fine-grained image classification by investigation of bipartite graph labels
CN107480261B (en) Fine-grained face image fast retrieval method based on deep learning
Li et al. A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries
US8903198B2 (en) Image ranking based on attribute correlation
Gao et al. Convolution Neural Network Based on Two‐Dimensional Spectrum for Hyperspectral Image Classification
CN110532409A (en) Image search method based on isomery bilinearity attention network
Xu Multiple-instance learning based decision neural networks for image retrieval and classification
Wright et al. Artfid: Quantitative evaluation of neural style transfer
Wu et al. A novel ship classification approach for high resolution SAR images based on the BDA-KELM classification model
Huang et al. Extreme learning machine with multi-scale local receptive fields for texture classification
Cheng et al. Hierarchical attributes learning for pedestrian re-identification via parallel stochastic gradient descent combined with momentum correction and adaptive learning rate
CN111126249A (en) Pedestrian re-identification method and device combining big data and Bayes
CN115222896B (en) Three-dimensional reconstruction method, three-dimensional reconstruction device, electronic equipment and computer readable storage medium
CN116108267A (en) Recommendation method and related equipment
Huang et al. View-based weight network for 3D object recognition
Sikdar et al. Scale-invariant batch-adaptive residual learning for person re-identification
Xiong et al. Person re-identification with multiple similarity probabilities using deep metric learning for efficient smart security applications
Shi et al. Anchor-based self-ensembling for semi-supervised deep pairwise hashing
Sima et al. Composite kernel of mutual learning on mid-level features for hyperspectral image classification
CN114693923A (en) Three-dimensional point cloud semantic segmentation method based on context and attention
Xia et al. Clothing classification using transfer learning with squeeze and excitation block
Du et al. POLAR++: active one-shot personalized article recommendation
Zhou et al. Clothing image classification with DenseNet201 network and optimized regularized random vector functional link
Krasilenko et al. Experimental research of methods for clustering and selecting image fragments using spatial invariant equivalent models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant