CN107885854B - semi-supervised cross-media retrieval method based on feature selection and virtual data generation - Google Patents

semi-supervised cross-media retrieval method based on feature selection and virtual data generation Download PDF

Info

Publication number
CN107885854B
CN107885854B CN201711124618.9A CN201711124618A CN107885854B CN 107885854 B CN107885854 B CN 107885854B CN 201711124618 A CN201711124618 A CN 201711124618A CN 107885854 B CN107885854 B CN 107885854B
Authority
CN
China
Prior art keywords
data
text
projection
class
virtual data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711124618.9A
Other languages
Chinese (zh)
Other versions
CN107885854A (en
Inventor
孙建德
于恩
李静
张化祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Normal University
Original Assignee
Shandong Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Normal University filed Critical Shandong Normal University
Priority to CN201711124618.9A priority Critical patent/CN107885854B/en
Publication of CN107885854A publication Critical patent/CN107885854A/en
Application granted granted Critical
Publication of CN107885854B publication Critical patent/CN107885854B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides methods based on feature selection and virtualizationThe method provides virtual data points generated according to the characteristics of training data to expand the training data, and adopts l in the process of learning two pairs of projection matrixes2,1Specifically, class centers are found for each classes of images and text, new data points are randomly generated around the class centers to form new training data, and the new data are used to learn two pairs of projection matrices while l is used2,1The method not only generates random data points to improve the diversity of training data, but also can select characteristics with more distinguishing and rich information when learning two pairs of projection matrixes.

Description

semi-supervised cross-media retrieval method based on feature selection and virtual data generation
Technical Field
The invention relates to a cross-media retrieval method, in particular to semi-supervised cross-media retrieval methods based on feature selection and virtual data generation.
Background
The cross-media retrieval refers to using modals as query data to retrieve other modals having the same semantic information, and using pictures and texts as examples, the pictures can be used to retrieve texts having corresponding semantic information, which is abbreviated as I2T, or the texts can be used to retrieve pictures having corresponding semantic information, which is abbreviated as T2I.
In cross-media retrieval techniques, the most important issue is that data of different modalities have different feature representations, which are in different dimensional spaces, so that the similarity between heterogeneous data is not directly comparable.therefore, the main concern across the media retrieval field is how to cross this Semantic gap. popular solutions are subspace learning methods.subspace learning methods aim to learn potential Semantic spaces in which the similarity of heterogeneous data can be directly measured. popular methods are learning versus projection matrices by which versus projection matrices can map data of different modalities into potential Semantic spaces, so that the similarity of heterogeneous data can be measured. popular methods are Canonical Correlation Analysis (CCA), CCA) which learns versus projection matrices, maximizing the Correlation between heterogeneous data when mapping different modalities to Semantic spaces, based on the Correlation between the Semantic data, and obtaining a multi-angle Correlation regression analysis by using a Semantic regression algorithm 8678 (gmanalysis), and obtaining a multi-angle Correlation regression analysis by using a Semantic regression method, which maximizes the Correlation between the Semantic data obtained by gmanalysis and a multiple angle regression method.
However, the common Cross-media Retrieval task has directionality, namely an image Retrieval text (I2T) and a text Retrieval image (T2I), and the above method only learns pairs of projection matrixes and does not emphasize the importance of query data, specifically, in the I2T task, a picture is more decisive for learning the projection matrixes, and in the same way, in the T2I task, the importance of the text is more emphasized.
Secondly, the current method only aims at learning a more effective projection matrix from the perspective of how to measure the similarity between heterogeneous data, so that more accurate comparison can be obtained in a semantic space, but the current method ignores the selection of richer information content and more distinctive features when learning the projection matrix, so that semi-supervised methods capable of randomly generating virtual data points are invented based on MDCR, and l is adopted2,1And selecting characteristics by using the norm.
Disclosure of Invention
The invention provides semi-supervised cross-media retrieval technologies based on feature selection and pseudo-random data generation, and the traditional cross-media retrieval method is either a supervised method for training only by using marked data or a semi-supervised method for selecting parts of unmarked data and adding the unmarked data into training2,1And selecting characteristics by using the norm. In general, our approach considers both the diversity of the training data and the choice of valid features.
The specific technical scheme of the invention is as follows:
semi-supervised cross-media retrieval technique based on feature selection and virtual data generation, comprising the following steps:
step 1: given data set
Figure BDA0001468131080000021
n represents the total number of data pairs, xiRepresenting a picture feature, tiWhich represents a feature of the text that,then, the picture and text feature matrices can be represented as: xG=[x1,x2,...,xn-1,xn]And TG=[t1,t2,t3,...,tn-1,tn];
Step 2: generating a pseudo-random virtual data point, and expanding an original data set, wherein the specific method comprises the following steps: calculating XGAnd TGCalculating the mean value of each dimensionality of the data of each class, namely for each class of data, obtaining a new vector formed by the mean values of all dimensionalities as the class center of the class, then randomly generating n 'numerical values at the upper part and the lower part by taking the mean value of each dimensionality as the center, combining the random values on all the dimensionalities at to generate n' pseudo-random virtual data
Figure BDA0001468131080000022
Adding the pseudo-random virtual data point into the original data set to obtain an expanded data set GallThe expanded picture and text feature matrices are expressed as: x ═ X1,...xn,x1',x'2...x'n]And T ═ T1,...,tn,t1',t'2,...,t'n];
And step 3: constructing an objective function:
defining an objective function:
wherein, U and V represent pairs of projection matrixes to be learned by the method, C (U and V) is a correlation analysis item, so that multi-modal data can keep paired neighbor relations in a potential semantic space, L (U and V) is a linear regression item from an image or text modal feature space to the semantic space, and is used for keeping the neighbor relations of different modal data with the same semantic meaning, and N (U and V) is a regular item and is used for selecting features;
according to formula (1), the objective functions of the retrieval tasks of the image retrieval text I2T and the text retrieval image T2I are obtained as follows:
(1) the objective function of I2T is:
Figure BDA0001468131080000032
wherein, U1,V1The projection matrix to be learned in the task of I2T is respectively corresponding to U in the formula (1), V, β is a balance coefficient and is more than or equal to 0 and less than or equal to β and less than or equal to 1, and Y is a semantic matrix;
(2) the objective function for T2I is:
wherein, U2,V2Is the projection matrix to be learned in the task of T2I, corresponding to U, V in equation (1);
and 4, step 4: obtaining an optimal projection matrix through an iterative solution method:
since the formulas (2) and (3) are non-convex, the solution is carried out by adopting a control variable method, namely, the partial derivatives of U and V are respectively solved and are made to be equal to zero, and the values of the projection matrixes U and V can be obtained; and then, continuously iterating until convergence, and solving the optimal values of the projection matrixes U and V.
Specifically, in step 3, N (U, V) ═ λ1||U||2,12||V||2,1Wherein λ is1,λ2The method is used for balancing two regular terms which are both positive numbers, and the constraint term is used for selecting more distinctive and rich information characteristics when a projection matrix is learned.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
1. Data set processing:
wikipedia, contains a total of 10 classes, 2866 picture-text pairs. We selected 2173 picture-text pairs as the initial training data, with the remainder being test data. The picture characteristic is a CNN characteristic of 4096 dimensions, and the text characteristic is a 100-dimensional LDA characteristic.
Pascal sequence, 20 total classes, 50 picture-text pairs per class. We selected 30 image-text pairs in each class as initial training data, with the remainder being test data. The picture characteristic is a CNN characteristic of 4096 dimensions, and the text characteristic is a 100-dimensional LDA characteristic.
INRIA-Websearch, 353 classes total, 71478 image-text pairs. We randomly selected 70% of them as initial training data, and the rest as test data. The picture characteristic is a CNN characteristic of 4096 dimensions, and the text characteristic is a 1000-dimensional LDA characteristic.
2. The method comprises the following specific implementation steps:
step 1: given data set
Figure BDA0001468131080000041
n represents the total number of data pairs, xiRepresenting a picture feature, tiRepresenting text features, then the picture and text feature matrices can be represented as: xG=[x1,x2,...,xn-1,xn]And TG=[t1,t2,t3,...,tn-1,tn]。
Step 2: generating a pseudo-random virtual data point, and expanding an original data set, wherein the specific method comprises the following steps: calculating XGAnd TGCalculating the mean value of each dimensionality of the data of each class, namely for each class of data, obtaining a new vector formed by the mean values of all dimensionalities as the class center of the class, then randomly generating n 'numerical values at the upper part and the lower part by taking the mean value of each dimensionality as the center, combining the random values on all the dimensionalities at to generate n' pseudo-random virtual data
Figure BDA0001468131080000042
Adding the pseudo-random virtual data points into the original data set to obtain an expanded data set GallThe expanded picture and text feature matrices are expressed as: x ═ X1,...xn,x1',x'2...x'n]And T ═ T1,...,tn,t′1,t′2,...,t′n]。
And step 3: constructing an objective function:
defining an objective function:
wherein, U and V represent pairs of projection matrixes to be learned by the method, C (U and V) is a correlation analysis item, so that multi-modal data can keep paired neighbor relations in a potential semantic space, L (U and V) is a linear regression item from an image or text modal feature space to the semantic space, and is used for keeping the neighbor relations of different modal data with the same semantic meaning, and N (U and V) is a regular item and is used for selecting features;
according to formula (1), the objective functions of the retrieval tasks of the image retrieval text I2T and the text retrieval image T2I are obtained as follows:
(1) the objective function of I2T is:
wherein, U1,V1Is a projection matrix to be learned in the task of I2T, β is a balance coefficient and is more than or equal to 0 and less than or equal to β and less than or equal to 1, Y is a semantic matrix, and N (U)1,V1)=λ1||U1||2,12||V1||2,1Wherein λ is1,λ2The two regular terms are used for balancing and are both positive numbers;
(2) the objective function for T2I is:
wherein, U2,V2Is the projection matrix N (U) to be learned in the task of T2I2,V2)=λ1||U2||2,12||V2||2,1
And 4, step 4: obtaining an optimal projection matrix through an iterative solution method:
since the formulas (2) and (3) are non-convex, the solution is carried out by adopting a control variable method, namely, the partial derivatives of U and V are respectively solved and are made to be equal to zero, and the values of the projection matrixes U and V can be obtained; and then, continuously iterating until convergence, and solving the optimal values of the projection matrixes U and V.
In particular, for l2,1The norm may be derived using traces such as: defining a matrix U, then: | U | luminance2,1=Tr(UTRU), R is diagonal matrices,uifor each rows representing U, ε is tiny real numbers.
3. Evaluation criteria (mAP)
We evaluated the final search effect using mean average precision (mAP) evaluation criteria first we defined an average precision for every queries:
Figure BDA0001468131080000054
and N represents the total number of samples in the test data, and when the result sequence of the ith retrieval is the same as the corresponding class label, rel (i) is 1, otherwise rel (i) is 0. P (i) represents the precision of the sorting result of the ith retrieval. Then the average of all queried AP values is the maps.
4. Algorithm implementation
(1)I2T:
Inputting: picture feature matrix XGAnd text feature matrix TGSample marker matrix Y, parameter λ1,λ2,β
Virtual data is generated by first computing, for each class of data, the mean of each dimension, which is the class center of the class,
then, taking the mean value of every dimensions as the center, randomly generating n 'numerical values above and below the mean value, combining the random values on all dimensions at to form n' virtual data, and finally adding the generated virtual data into the input picture and text feature matrix to obtain a new training picture feature matrix X and a new text feature matrix T.
Initialization: initializing the projection matrix U1,V1Is an identity matrix.
Solving an optimal solution: according to the obtained U1=(XXT1R11)-1[βXY+(1-β)XTTV1]And
V1=[(1-β)TTT2R12]-1(1-β)TXTU1by iterating continuously until the result converges to obtain the optimum U1,V1
The pseudo code for this process is as follows:
(2)T2I:
similar to the task of I2T, the optimal projection matrix U is obtained2,V2
5. Comparison of results
We performed experiments on three datasets separately and compared 7 other methods (PLS, CCA, SM, SCM, GMMFA, GMLDA, MDCR) that are currently popular, and the following table shows that the method of the present invention shows better retrieval on different datasets.
Figure BDA0001468131080000073
Figure BDA0001468131080000082

Claims (2)

1, A semi-supervised cross-media retrieval method based on feature selection and virtual data generation, comprising the following steps:
step 1: given data set
Figure FDA0002228360370000011
n represents the total number of data pairs, xiRepresenting a picture feature, tiRepresenting text features, then the picture and text feature matrices can be represented as: xG=[x1,x2,...xn-1,xn]And TG=[t1,t2,t3,...tn-1,tn];
Step 2: generating a pseudo-random virtual data point, and expanding an original data set, wherein the specific method comprises the following steps: calculating XGAnd TGFor each class center of classes, calculating the mean value of each dimensionality of the class data for each class data, taking a new vector formed by the mean values of all dimensionalities as the class center of the class, then randomly generating n 'numerical values at the upper part and the lower part by taking the mean value of each dimensionality as the center, combining the random values on all dimensionalities in to generate n' pseudo-random virtual data
Figure FDA0002228360370000012
Adding the pseudo-random virtual data points into the original data set to obtain an expanded data set GallThe expanded picture and text feature matrices are expressed as: x ═ X1,...,xn,x′1,x'2,...x'n]And T ═ T1,...,tn,t′1,t'2,...,t'n];
And step 3: constructing an objective function:
defining an objective function:
Figure FDA0002228360370000013
wherein, U and V represent pairs of projection matrixes to be learned by the method, C (U and V) is a correlation analysis item, so that multi-modal data can keep paired neighbor relations in a potential semantic space, L (U and V) is a linear regression item from an image or text modal feature space to the semantic space, and is used for keeping the neighbor relations of different modal data with the same semantic meaning, and N (U and V) is a regular item and is used for selecting features;
according to formula (1), the objective functions of the retrieval tasks of the image retrieval text I2T and the text retrieval image T2I are obtained as follows:
(1) the objective function of I2T is:
wherein, U1,V1The projection matrix to be learned in the task of I2T is respectively corresponding to U in the formula (1), V, β is a balance coefficient and is more than or equal to 0 and less than or equal to β and less than or equal to 1, and Y is a semantic matrix;
(2) the objective function for T2I is:
wherein, U2,V2Is the projection matrix to be learned in the task of T2I, corresponding to U, V in equation (1);
and 4, step 4: obtaining an optimal projection matrix through an iterative solution method:
since the formulas (2) and (3) are non-convex, the solution is carried out by adopting a control variable method, namely, the partial derivatives of U and V are respectively solved and are made to be equal to zero, and the values of the projection matrixes U and V can be obtained; and then, continuously iterating until convergence, and solving the optimal values of the projection matrixes U and V.
2. The semi-supervised cross-media retrieval method based on feature selection and virtual data generation as recited in claim 1, wherein: in step 3, N (U, V) ═ λ1||U||2,12||V||2,1Wherein λ is1,λ2The method is used for balancing two regular terms which are both positive numbers, and the constraint term is used for selecting more distinctive and rich information characteristics when a projection matrix is learned.
CN201711124618.9A 2017-11-14 2017-11-14 semi-supervised cross-media retrieval method based on feature selection and virtual data generation Active CN107885854B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711124618.9A CN107885854B (en) 2017-11-14 2017-11-14 semi-supervised cross-media retrieval method based on feature selection and virtual data generation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711124618.9A CN107885854B (en) 2017-11-14 2017-11-14 semi-supervised cross-media retrieval method based on feature selection and virtual data generation

Publications (2)

Publication Number Publication Date
CN107885854A CN107885854A (en) 2018-04-06
CN107885854B true CN107885854B (en) 2020-01-31

Family

ID=61776703

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711124618.9A Active CN107885854B (en) 2017-11-14 2017-11-14 semi-supervised cross-media retrieval method based on feature selection and virtual data generation

Country Status (1)

Country Link
CN (1) CN107885854B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109857892B (en) * 2018-12-29 2022-12-02 西安电子科技大学 Semi-supervised cross-modal Hash retrieval method based on class label transfer
CN109784405B (en) * 2019-01-16 2020-09-08 山东建筑大学 Cross-modal retrieval method and system based on pseudo-tag learning and semantic consistency
CN112419324B (en) * 2020-11-24 2022-04-19 山西三友和智慧信息技术股份有限公司 Medical image data expansion method based on semi-supervised task driving

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105069173A (en) * 2015-09-10 2015-11-18 天津中科智能识别产业技术研究院有限公司 Rapid image retrieval method based on supervised topology keeping hash
US9332137B2 (en) * 2012-09-28 2016-05-03 Interactive Memories Inc. Method for form filling an address on a mobile computing device based on zip code lookup
CN106462642A (en) * 2014-06-24 2017-02-22 谷歌公司 Methods, Systems And Media For Performing Personalized Actions On Mobile Devices Associated With A Media Presentation Device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9332137B2 (en) * 2012-09-28 2016-05-03 Interactive Memories Inc. Method for form filling an address on a mobile computing device based on zip code lookup
CN106462642A (en) * 2014-06-24 2017-02-22 谷歌公司 Methods, Systems And Media For Performing Personalized Actions On Mobile Devices Associated With A Media Presentation Device
CN105069173A (en) * 2015-09-10 2015-11-18 天津中科智能识别产业技术研究院有限公司 Rapid image retrieval method based on supervised topology keeping hash

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Modality-Dependent Cross-Media Retrieval;YUNCHAO WEI等;《ACM Transactions on Intelligent Systems and Technology》;20160331;第57-69页 *

Also Published As

Publication number Publication date
CN107885854A (en) 2018-04-06

Similar Documents

Publication Publication Date Title
US10755128B2 (en) Scene and user-input context aided visual search
Xia et al. Multiview spectral embedding
Uijlings et al. Video classification with densely extracted hog/hof/mbh features: an evaluation of the accuracy/computational efficiency trade-off
CN110059198B (en) Discrete hash retrieval method of cross-modal data based on similarity maintenance
Jegou et al. Accurate image search using the contextual dissimilarity measure
Qian et al. Enhancing sketch-based image retrieval by re-ranking and relevance feedback
CN108132968A (en) Network text is associated with the Weakly supervised learning method of Semantic unit with image
CN107885854B (en) semi-supervised cross-media retrieval method based on feature selection and virtual data generation
CN109376261B (en) Mode independent retrieval method and system based on intermediate text semantic enhancing space
Li Tag relevance fusion for social image retrieval
Huang et al. Sketch-based image retrieval with deep visual semantic descriptor
CN105701225B (en) A kind of cross-media retrieval method based on unified association hypergraph specification
CN106951509B (en) Multi-tag coring canonical correlation analysis search method
CN109255377A (en) Instrument recognition methods, device, electronic equipment and storage medium
George et al. Semantic clustering for robust fine-grained scene recognition
CN105975643B (en) A kind of realtime graphic search method based on text index
CN109858543B (en) Image memorability prediction method based on low-rank sparse representation and relationship inference
Chen et al. Large-scale indoor/outdoor image classification via expert decision fusion (edf)
Saito et al. Demian: Deep modality invariant adversarial network
Ji et al. Efficient semi-supervised multiple feature fusion with out-of-sample extension for 3D model retrieval
Kordopatis-Zilos et al. CERTH/CEA LIST at MediaEval Placing Task 2015.
Sicre et al. Dense sampling of features for image retrieval
JP2016014990A (en) Moving image search method, moving image search device, and program thereof
CN110704575B (en) Dynamic self-adaptive binary hierarchical vocabulary tree image retrieval method
Cui et al. Dimensionality reduction for histogram features: A distance-adaptive approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant