CN107885854B - semi-supervised cross-media retrieval method based on feature selection and virtual data generation - Google Patents
semi-supervised cross-media retrieval method based on feature selection and virtual data generation Download PDFInfo
- Publication number
- CN107885854B CN107885854B CN201711124618.9A CN201711124618A CN107885854B CN 107885854 B CN107885854 B CN 107885854B CN 201711124618 A CN201711124618 A CN 201711124618A CN 107885854 B CN107885854 B CN 107885854B
- Authority
- CN
- China
- Prior art keywords
- data
- text
- projection
- class
- virtual data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/43—Querying
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
The invention provides methods based on feature selection and virtualizationThe method provides virtual data points generated according to the characteristics of training data to expand the training data, and adopts l in the process of learning two pairs of projection matrixes2,1Specifically, class centers are found for each classes of images and text, new data points are randomly generated around the class centers to form new training data, and the new data are used to learn two pairs of projection matrices while l is used2,1The method not only generates random data points to improve the diversity of training data, but also can select characteristics with more distinguishing and rich information when learning two pairs of projection matrixes.
Description
Technical Field
The invention relates to a cross-media retrieval method, in particular to semi-supervised cross-media retrieval methods based on feature selection and virtual data generation.
Background
The cross-media retrieval refers to using modals as query data to retrieve other modals having the same semantic information, and using pictures and texts as examples, the pictures can be used to retrieve texts having corresponding semantic information, which is abbreviated as I2T, or the texts can be used to retrieve pictures having corresponding semantic information, which is abbreviated as T2I.
In cross-media retrieval techniques, the most important issue is that data of different modalities have different feature representations, which are in different dimensional spaces, so that the similarity between heterogeneous data is not directly comparable.therefore, the main concern across the media retrieval field is how to cross this Semantic gap. popular solutions are subspace learning methods.subspace learning methods aim to learn potential Semantic spaces in which the similarity of heterogeneous data can be directly measured. popular methods are learning versus projection matrices by which versus projection matrices can map data of different modalities into potential Semantic spaces, so that the similarity of heterogeneous data can be measured. popular methods are Canonical Correlation Analysis (CCA), CCA) which learns versus projection matrices, maximizing the Correlation between heterogeneous data when mapping different modalities to Semantic spaces, based on the Correlation between the Semantic data, and obtaining a multi-angle Correlation regression analysis by using a Semantic regression algorithm 8678 (gmanalysis), and obtaining a multi-angle Correlation regression analysis by using a Semantic regression method, which maximizes the Correlation between the Semantic data obtained by gmanalysis and a multiple angle regression method.
However, the common Cross-media Retrieval task has directionality, namely an image Retrieval text (I2T) and a text Retrieval image (T2I), and the above method only learns pairs of projection matrixes and does not emphasize the importance of query data, specifically, in the I2T task, a picture is more decisive for learning the projection matrixes, and in the same way, in the T2I task, the importance of the text is more emphasized.
Secondly, the current method only aims at learning a more effective projection matrix from the perspective of how to measure the similarity between heterogeneous data, so that more accurate comparison can be obtained in a semantic space, but the current method ignores the selection of richer information content and more distinctive features when learning the projection matrix, so that semi-supervised methods capable of randomly generating virtual data points are invented based on MDCR, and l is adopted2,1And selecting characteristics by using the norm.
Disclosure of Invention
The invention provides semi-supervised cross-media retrieval technologies based on feature selection and pseudo-random data generation, and the traditional cross-media retrieval method is either a supervised method for training only by using marked data or a semi-supervised method for selecting parts of unmarked data and adding the unmarked data into training2,1And selecting characteristics by using the norm. In general, our approach considers both the diversity of the training data and the choice of valid features.
The specific technical scheme of the invention is as follows:
semi-supervised cross-media retrieval technique based on feature selection and virtual data generation, comprising the following steps:
step 1: given data setn represents the total number of data pairs, xiRepresenting a picture feature, tiWhich represents a feature of the text that,then, the picture and text feature matrices can be represented as: xG=[x1,x2,...,xn-1,xn]And TG=[t1,t2,t3,...,tn-1,tn];
Step 2: generating a pseudo-random virtual data point, and expanding an original data set, wherein the specific method comprises the following steps: calculating XGAnd TGCalculating the mean value of each dimensionality of the data of each class, namely for each class of data, obtaining a new vector formed by the mean values of all dimensionalities as the class center of the class, then randomly generating n 'numerical values at the upper part and the lower part by taking the mean value of each dimensionality as the center, combining the random values on all the dimensionalities at to generate n' pseudo-random virtual dataAdding the pseudo-random virtual data point into the original data set to obtain an expanded data set GallThe expanded picture and text feature matrices are expressed as: x ═ X1,...xn,x1',x'2...x'n]And T ═ T1,...,tn,t1',t'2,...,t'n];
And step 3: constructing an objective function:
defining an objective function:
wherein, U and V represent pairs of projection matrixes to be learned by the method, C (U and V) is a correlation analysis item, so that multi-modal data can keep paired neighbor relations in a potential semantic space, L (U and V) is a linear regression item from an image or text modal feature space to the semantic space, and is used for keeping the neighbor relations of different modal data with the same semantic meaning, and N (U and V) is a regular item and is used for selecting features;
according to formula (1), the objective functions of the retrieval tasks of the image retrieval text I2T and the text retrieval image T2I are obtained as follows:
(1) the objective function of I2T is:
wherein, U1,V1The projection matrix to be learned in the task of I2T is respectively corresponding to U in the formula (1), V, β is a balance coefficient and is more than or equal to 0 and less than or equal to β and less than or equal to 1, and Y is a semantic matrix;
(2) the objective function for T2I is:
wherein, U2,V2Is the projection matrix to be learned in the task of T2I, corresponding to U, V in equation (1);
and 4, step 4: obtaining an optimal projection matrix through an iterative solution method:
since the formulas (2) and (3) are non-convex, the solution is carried out by adopting a control variable method, namely, the partial derivatives of U and V are respectively solved and are made to be equal to zero, and the values of the projection matrixes U and V can be obtained; and then, continuously iterating until convergence, and solving the optimal values of the projection matrixes U and V.
Specifically, in step 3, N (U, V) ═ λ1||U||2,1+λ2||V||2,1Wherein λ is1,λ2The method is used for balancing two regular terms which are both positive numbers, and the constraint term is used for selecting more distinctive and rich information characteristics when a projection matrix is learned.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
1. Data set processing:
wikipedia, contains a total of 10 classes, 2866 picture-text pairs. We selected 2173 picture-text pairs as the initial training data, with the remainder being test data. The picture characteristic is a CNN characteristic of 4096 dimensions, and the text characteristic is a 100-dimensional LDA characteristic.
Pascal sequence, 20 total classes, 50 picture-text pairs per class. We selected 30 image-text pairs in each class as initial training data, with the remainder being test data. The picture characteristic is a CNN characteristic of 4096 dimensions, and the text characteristic is a 100-dimensional LDA characteristic.
INRIA-Websearch, 353 classes total, 71478 image-text pairs. We randomly selected 70% of them as initial training data, and the rest as test data. The picture characteristic is a CNN characteristic of 4096 dimensions, and the text characteristic is a 1000-dimensional LDA characteristic.
2. The method comprises the following specific implementation steps:
step 1: given data setn represents the total number of data pairs, xiRepresenting a picture feature, tiRepresenting text features, then the picture and text feature matrices can be represented as: xG=[x1,x2,...,xn-1,xn]And TG=[t1,t2,t3,...,tn-1,tn]。
Step 2: generating a pseudo-random virtual data point, and expanding an original data set, wherein the specific method comprises the following steps: calculating XGAnd TGCalculating the mean value of each dimensionality of the data of each class, namely for each class of data, obtaining a new vector formed by the mean values of all dimensionalities as the class center of the class, then randomly generating n 'numerical values at the upper part and the lower part by taking the mean value of each dimensionality as the center, combining the random values on all the dimensionalities at to generate n' pseudo-random virtual dataAdding the pseudo-random virtual data points into the original data set to obtain an expanded data set GallThe expanded picture and text feature matrices are expressed as: x ═ X1,...xn,x1',x'2...x'n]And T ═ T1,...,tn,t′1,t′2,...,t′n]。
And step 3: constructing an objective function:
defining an objective function:
wherein, U and V represent pairs of projection matrixes to be learned by the method, C (U and V) is a correlation analysis item, so that multi-modal data can keep paired neighbor relations in a potential semantic space, L (U and V) is a linear regression item from an image or text modal feature space to the semantic space, and is used for keeping the neighbor relations of different modal data with the same semantic meaning, and N (U and V) is a regular item and is used for selecting features;
according to formula (1), the objective functions of the retrieval tasks of the image retrieval text I2T and the text retrieval image T2I are obtained as follows:
(1) the objective function of I2T is:
wherein, U1,V1Is a projection matrix to be learned in the task of I2T, β is a balance coefficient and is more than or equal to 0 and less than or equal to β and less than or equal to 1, Y is a semantic matrix, and N (U)1,V1)=λ1||U1||2,1+λ2||V1||2,1Wherein λ is1,λ2The two regular terms are used for balancing and are both positive numbers;
(2) the objective function for T2I is:
wherein, U2,V2Is the projection matrix N (U) to be learned in the task of T2I2,V2)=λ1||U2||2,1+λ2||V2||2,1;
And 4, step 4: obtaining an optimal projection matrix through an iterative solution method:
since the formulas (2) and (3) are non-convex, the solution is carried out by adopting a control variable method, namely, the partial derivatives of U and V are respectively solved and are made to be equal to zero, and the values of the projection matrixes U and V can be obtained; and then, continuously iterating until convergence, and solving the optimal values of the projection matrixes U and V.
In particular, for l2,1The norm may be derived using traces such as: defining a matrix U, then: | U | luminance2,1=Tr(UTRU), R is diagonal matrices,uifor each rows representing U, ε is tiny real numbers.
3. Evaluation criteria (mAP)
We evaluated the final search effect using mean average precision (mAP) evaluation criteria first we defined an average precision for every queries:
and N represents the total number of samples in the test data, and when the result sequence of the ith retrieval is the same as the corresponding class label, rel (i) is 1, otherwise rel (i) is 0. P (i) represents the precision of the sorting result of the ith retrieval. Then the average of all queried AP values is the maps.
4. Algorithm implementation
(1)I2T:
Inputting: picture feature matrix XGAnd text feature matrix TGSample marker matrix Y, parameter λ1,λ2,β
Virtual data is generated by first computing, for each class of data, the mean of each dimension, which is the class center of the class,
then, taking the mean value of every dimensions as the center, randomly generating n 'numerical values above and below the mean value, combining the random values on all dimensions at to form n' virtual data, and finally adding the generated virtual data into the input picture and text feature matrix to obtain a new training picture feature matrix X and a new text feature matrix T.
Initialization: initializing the projection matrix U1,V1Is an identity matrix.
Solving an optimal solution: according to the obtained U1=(XXT+λ1R11)-1[βXY+(1-β)XTTV1]And
V1=[(1-β)TTT+λ2R12]-1(1-β)TXTU1by iterating continuously until the result converges to obtain the optimum U1,V1。
The pseudo code for this process is as follows:
(2)T2I:
similar to the task of I2T, the optimal projection matrix U is obtained2,V2
5. Comparison of results
We performed experiments on three datasets separately and compared 7 other methods (PLS, CCA, SM, SCM, GMMFA, GMLDA, MDCR) that are currently popular, and the following table shows that the method of the present invention shows better retrieval on different datasets.
Claims (2)
1, A semi-supervised cross-media retrieval method based on feature selection and virtual data generation, comprising the following steps:
step 1: given data setn represents the total number of data pairs, xiRepresenting a picture feature, tiRepresenting text features, then the picture and text feature matrices can be represented as: xG=[x1,x2,...xn-1,xn]And TG=[t1,t2,t3,...tn-1,tn];
Step 2: generating a pseudo-random virtual data point, and expanding an original data set, wherein the specific method comprises the following steps: calculating XGAnd TGFor each class center of classes, calculating the mean value of each dimensionality of the class data for each class data, taking a new vector formed by the mean values of all dimensionalities as the class center of the class, then randomly generating n 'numerical values at the upper part and the lower part by taking the mean value of each dimensionality as the center, combining the random values on all dimensionalities in to generate n' pseudo-random virtual dataAdding the pseudo-random virtual data points into the original data set to obtain an expanded data set GallThe expanded picture and text feature matrices are expressed as: x ═ X1,...,xn,x′1,x'2,...x'n]And T ═ T1,...,tn,t′1,t'2,...,t'n];
And step 3: constructing an objective function:
wherein, U and V represent pairs of projection matrixes to be learned by the method, C (U and V) is a correlation analysis item, so that multi-modal data can keep paired neighbor relations in a potential semantic space, L (U and V) is a linear regression item from an image or text modal feature space to the semantic space, and is used for keeping the neighbor relations of different modal data with the same semantic meaning, and N (U and V) is a regular item and is used for selecting features;
according to formula (1), the objective functions of the retrieval tasks of the image retrieval text I2T and the text retrieval image T2I are obtained as follows:
(1) the objective function of I2T is:
wherein, U1,V1The projection matrix to be learned in the task of I2T is respectively corresponding to U in the formula (1), V, β is a balance coefficient and is more than or equal to 0 and less than or equal to β and less than or equal to 1, and Y is a semantic matrix;
(2) the objective function for T2I is:
wherein, U2,V2Is the projection matrix to be learned in the task of T2I, corresponding to U, V in equation (1);
and 4, step 4: obtaining an optimal projection matrix through an iterative solution method:
since the formulas (2) and (3) are non-convex, the solution is carried out by adopting a control variable method, namely, the partial derivatives of U and V are respectively solved and are made to be equal to zero, and the values of the projection matrixes U and V can be obtained; and then, continuously iterating until convergence, and solving the optimal values of the projection matrixes U and V.
2. The semi-supervised cross-media retrieval method based on feature selection and virtual data generation as recited in claim 1, wherein: in step 3, N (U, V) ═ λ1||U||2,1+λ2||V||2,1Wherein λ is1,λ2The method is used for balancing two regular terms which are both positive numbers, and the constraint term is used for selecting more distinctive and rich information characteristics when a projection matrix is learned.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711124618.9A CN107885854B (en) | 2017-11-14 | 2017-11-14 | semi-supervised cross-media retrieval method based on feature selection and virtual data generation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711124618.9A CN107885854B (en) | 2017-11-14 | 2017-11-14 | semi-supervised cross-media retrieval method based on feature selection and virtual data generation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107885854A CN107885854A (en) | 2018-04-06 |
CN107885854B true CN107885854B (en) | 2020-01-31 |
Family
ID=61776703
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711124618.9A Active CN107885854B (en) | 2017-11-14 | 2017-11-14 | semi-supervised cross-media retrieval method based on feature selection and virtual data generation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107885854B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109857892B (en) * | 2018-12-29 | 2022-12-02 | 西安电子科技大学 | Semi-supervised cross-modal Hash retrieval method based on class label transfer |
CN109784405B (en) * | 2019-01-16 | 2020-09-08 | 山东建筑大学 | Cross-modal retrieval method and system based on pseudo-tag learning and semantic consistency |
CN112419324B (en) * | 2020-11-24 | 2022-04-19 | 山西三友和智慧信息技术股份有限公司 | Medical image data expansion method based on semi-supervised task driving |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105069173A (en) * | 2015-09-10 | 2015-11-18 | 天津中科智能识别产业技术研究院有限公司 | Rapid image retrieval method based on supervised topology keeping hash |
US9332137B2 (en) * | 2012-09-28 | 2016-05-03 | Interactive Memories Inc. | Method for form filling an address on a mobile computing device based on zip code lookup |
CN106462642A (en) * | 2014-06-24 | 2017-02-22 | 谷歌公司 | Methods, Systems And Media For Performing Personalized Actions On Mobile Devices Associated With A Media Presentation Device |
-
2017
- 2017-11-14 CN CN201711124618.9A patent/CN107885854B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9332137B2 (en) * | 2012-09-28 | 2016-05-03 | Interactive Memories Inc. | Method for form filling an address on a mobile computing device based on zip code lookup |
CN106462642A (en) * | 2014-06-24 | 2017-02-22 | 谷歌公司 | Methods, Systems And Media For Performing Personalized Actions On Mobile Devices Associated With A Media Presentation Device |
CN105069173A (en) * | 2015-09-10 | 2015-11-18 | 天津中科智能识别产业技术研究院有限公司 | Rapid image retrieval method based on supervised topology keeping hash |
Non-Patent Citations (1)
Title |
---|
Modality-Dependent Cross-Media Retrieval;YUNCHAO WEI等;《ACM Transactions on Intelligent Systems and Technology》;20160331;第57-69页 * |
Also Published As
Publication number | Publication date |
---|---|
CN107885854A (en) | 2018-04-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10755128B2 (en) | Scene and user-input context aided visual search | |
Xia et al. | Multiview spectral embedding | |
Uijlings et al. | Video classification with densely extracted hog/hof/mbh features: an evaluation of the accuracy/computational efficiency trade-off | |
CN110059198B (en) | Discrete hash retrieval method of cross-modal data based on similarity maintenance | |
Jegou et al. | Accurate image search using the contextual dissimilarity measure | |
Qian et al. | Enhancing sketch-based image retrieval by re-ranking and relevance feedback | |
CN108132968A (en) | Network text is associated with the Weakly supervised learning method of Semantic unit with image | |
CN107885854B (en) | semi-supervised cross-media retrieval method based on feature selection and virtual data generation | |
CN109376261B (en) | Mode independent retrieval method and system based on intermediate text semantic enhancing space | |
Li | Tag relevance fusion for social image retrieval | |
Huang et al. | Sketch-based image retrieval with deep visual semantic descriptor | |
CN105701225B (en) | A kind of cross-media retrieval method based on unified association hypergraph specification | |
CN106951509B (en) | Multi-tag coring canonical correlation analysis search method | |
CN109255377A (en) | Instrument recognition methods, device, electronic equipment and storage medium | |
George et al. | Semantic clustering for robust fine-grained scene recognition | |
CN105975643B (en) | A kind of realtime graphic search method based on text index | |
CN109858543B (en) | Image memorability prediction method based on low-rank sparse representation and relationship inference | |
Chen et al. | Large-scale indoor/outdoor image classification via expert decision fusion (edf) | |
Saito et al. | Demian: Deep modality invariant adversarial network | |
Ji et al. | Efficient semi-supervised multiple feature fusion with out-of-sample extension for 3D model retrieval | |
Kordopatis-Zilos et al. | CERTH/CEA LIST at MediaEval Placing Task 2015. | |
Sicre et al. | Dense sampling of features for image retrieval | |
JP2016014990A (en) | Moving image search method, moving image search device, and program thereof | |
CN110704575B (en) | Dynamic self-adaptive binary hierarchical vocabulary tree image retrieval method | |
Cui et al. | Dimensionality reduction for histogram features: A distance-adaptive approach |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |