CN114943017A - Cross-modal retrieval method based on similarity zero sample hash - Google Patents

Cross-modal retrieval method based on similarity zero sample hash Download PDF

Info

Publication number
CN114943017A
CN114943017A CN202210696434.4A CN202210696434A CN114943017A CN 114943017 A CN114943017 A CN 114943017A CN 202210696434 A CN202210696434 A CN 202210696434A CN 114943017 A CN114943017 A CN 114943017A
Authority
CN
China
Prior art keywords
similarity
modal
hash
cross
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210696434.4A
Other languages
Chinese (zh)
Other versions
CN114943017B (en
Inventor
舒振球
永凯玲
余正涛
高盛祥
毛存礼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202210696434.4A priority Critical patent/CN114943017B/en
Publication of CN114943017A publication Critical patent/CN114943017A/en
Application granted granted Critical
Publication of CN114943017B publication Critical patent/CN114943017B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9014Indexing; Data structures therefor; Storage structures hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a cross-modal retrieval method based on similarity zero sample hash. A new zero sample hash framework is provided to fully mine and supervise semantic information, and the framework combines intra-modal similarity, inter-modal similarity, semantic tags and class attributes to guide the learning of the zero sample hash code. In this framework, both intra-modality and inter-modality similarities are considered. The intra-modal similarity represents the manifold structure and the feature similarity of multi-modal data, and the inter-modal similarity represents the semantic correlation between the modalities. In addition, semantic labels and class attributes are embedded into the hash code, and a more discriminative hash code is learned for each instance. However, due to the embedding of class properties, the relationship between visible and invisible classes can be well captured in the hash code, so that property knowledge can be transferred from the visible class to the invisible class. The invention realizes the high-precision retrieval of zero sample cross-modal data.

Description

Cross-modal retrieval method based on similarity zero sample hash
Technical Field
The invention relates to a cross-modal retrieval method based on similarity zero sample hash, and belongs to the field of cross-modal hash retrieval.
Background
Most existing cross-modal hash retrieval methods are studied in visible class datasets. However, with the explosive growth of multimedia data, a large number of new concepts (invisible classes) emerge. It is not feasible to retrain existing cross-modal hash models by collecting data for new concepts, as this would take a lot of time and space. Therefore, it is necessary to propose a cross-modal hash model in which the training data does not contain new concepts, but which can still handle the new concepts. However, zero sample learning can identify classes of data that have never been seen. That is, the trained classifier is not only able to identify the classes of data that are already in the training set, but also to distinguish between data from classes that are not seen. This makes zero sample learning a focus of research for invisible class retrieval tasks.
Zero sample learning has been widely used in the single modality search task over the past few years. Some researchers achieve potential semantic transitions by projecting tags into the word embedding space. Some researchers have proposed a zero sample hash based on asymmetric ratio similarity matrices to improve the knowledge transfer capability from visible classes to invisible classes. Other researchers have proposed a zero sample learning model for multi-label image retrieval that predicts the data labels of the invisible classes with an example conceptual consistency ranking algorithm. However, the above-mentioned work is a research on a monomodal search task, and a research on an invisible cross-modal search task is still insufficient. In the big data era with new concepts continuously emerging, the existing cross-modal retrieval method has the following problems: (1) the existing method only considers the visible class data and ignores the invisible class data. Therefore, such a model is not suitable for cross-modal data retrieval in the big data era. (2) Most methods do not use class attribute information in hash learning, which is detrimental to the transfer of knowledge from visible to invisible classes. (3) The existing few zero sample cross-modal retrieval methods fail to use intra-modal similarity, inter-modal similarity, class labels and class attributes to train models at the same time.
Disclosure of Invention
In view of the above existing challenges, the present invention provides a cross-modal retrieval method based on similarity zero sample hash. The invention is used for solving the cross-modal retrieval problem containing invisible class data by fusing intra-modal similarity, inter-modal similarity, label information and class attributes.
In order to achieve the purpose of the invention, the technical scheme of the cross-modal retrieval method based on the similarity zero sample hash is as follows: the invention provides a novel zero sample hash framework for fully mining and supervising semantic information, and the framework combines intra-modal similarity, inter-modal similarity, semantic labels and class attributes to guide the learning process of a zero sample hash code. In this framework, both intra-modality and inter-modality similarities are considered. The intra-modal similarity represents characteristics and semantic similarity among samples in the modal, and the inter-modal similarity represents semantic correlation among the modal. In addition, semantic labels and class attributes are embedded into the hash code, and a more discriminative hash code is learned for each instance. However, due to the embedding of class attributes, the relationship between visible and invisible classes can be well captured in the hash code, so that supervised knowledge can be transferred from the visible class to the invisible class. The invention comprises the following steps:
step1, acquiring a cross-modal data set, and performing feature extraction and class attribute vector extraction on the cross-modal data set;
step2, processing of cross-modal data set: processing the existing cross-modal data set into a cross-modal zero sample data set; the original data set is firstly divided into a training set and a query set, then 20% of classes are randomly selected from all classes of the original data set as invisible classes, and the rest classes are visible classes. For a zero sample cross-modal retrieval scene, the method takes a sample pair corresponding to an invisible class in an original query set as a new query set; taking the sample pair corresponding to the visible class in the original training set as a new training set; the retrieval set consists of an original training set;
step3, learning an objective function: the intra-modal similarity, the inter-modal similarity, the semantic label, the class attribute, the hash code and the hash function are fused and learned into the same frame, so that the target function is obtained, and the hash code with more discriminative performance is learned;
step4, performing iterative update of the objective function: iteratively updating the variable matrix in the target function obtained in the last step until the target function converges or reaches the maximum iteration times to obtain a hash function and a hash code of a training set;
step5, performing zero-sample cross-modality retrieval: and inputting a query sample, and obtaining a hash code of the query sample according to the hash function obtained at Step 4. The hash codes of the query samples are substituted into the retrieval set for query, and because the query is performed in a binary space, the query result is obtained by calculating the Hamming distance between the query sample and each sample in the retrieval set. The sample corresponding to the minimum hamming distance in the search set is the query result for us.
Further, the cross-modality retrieved data set includes a plurality of sample pairs, each sample pair including: text, images, and corresponding semantic tags.
Further, in the Step1, extracting image features through a VGG-16 model; extracting text features through a bag-of-words model; and extracting class attributes, and extracting a corresponding word vector for each class name by a Glove method to form a class attribute matrix.
Further, in Step2, in order to ensure the generalization ability of the model, each time data enters the model for training, a random selection method is used to process and divide the data set. And taking an average value as a final result through multiple training.
The intra-modal similarity in Step3 is divided into a feature similarity calculated by Euclidean similarity and a semantic similarity measured by Jaccard similarity.
Further, the inter-modality similarity in Step3 refers to semantic similarity between instances of different modalities, and the semantic similarity is measured by label semantic information.
Further, the target function obtained in Step3 includes two parts, namely hash code learning and a hash function, where the hash code learning refers to learning a hash code by combining intra-modal similarity, inter-modal similarity, semantic tags, and class attributes; the learning of the hash function refers to learning the hash function through a least square regression problem, and the learning of the hash code and the learning of the hash function are put into the same model for learning, so that the semantic relation between the hash code and the hash function is enhanced, and the high-precision zero sample cross-modal retrieval is realized.
Further, the iterative update of the target function in Step4 is to update the target function obtained in Step4 as an original function. It is clear that the objective function is not optimal and needs to be optimized. Since the objective function is a non-convex problem, when other variables are fixed and a matrix variable is updated, the function is a convex problem, and the objective function is convenient to update. The alternating iteration algorithm is adopted to update the matrix variable until the target function converges or the maximum iteration times is reached, and finally the optimal hash code and the hash function are obtained.
Further, in Step3, establishing intra-modal similarity and inter-modal similarity by a kernel-based supervised hashing (KSH) optimization model to link with the hash code, so as to enhance semantic information in the hash code in a manner of embedding the similarity into the hash code; establishing a relation between the semantic label and the class attribute and the hash code in a label reconstruction mode, embedding a label in the hash code, and enhancing semantic information contained in the hash code; and embedding class attributes in the hash codes, wherein the attribute knowledge in the visible class is transferred to the invisible class, so that the retrieval of the invisible class is realized.
Further, in Step4, since the original model is a non-convex problem, it is a difficult problem to directly optimize it. However, when the other variables are fixed and only one variable is optimized, the new problem converted by the original model is a convex problem and can be directly optimized and solved. By analogy, each variable is optimized in such a way until convergence or the maximum iteration number is reached, and the optimal result is obtained.
(1) The existing method only considers the visible class data and ignores the invisible class data. Therefore, such a model is not suitable for cross-modal data retrieval in the big data era. (2) Most methods do not use class attribute information in hash learning, which is detrimental to the transfer of knowledge from visible to invisible classes. (3) The existing few zero sample cross-modal retrieval methods fail to train models using intra-modal similarity, inter-modal similarity, class labels, and class attributes simultaneously.
The invention has the beneficial effects that:
the invention provides a cross-modal retrieval method based on similarity zero sample hash. The method overcomes the limitation that most of the existing cross-modal retrieval methods cannot solve zero sample data. The method learns the hash codes by simultaneously using intra-modal similarity, inter-modal similarity and class attributes, so that the relationship between visible classes and invisible classes can be well captured, and the supervision knowledge is transferred from the visible classes to the invisible classes. Furthermore, to account for supervised tag information, the present invention improves accuracy by embedding tag information into the attribute space. Therefore, a more discriminative hash code can be generated from the model proposed by the present invention. In addition, the invention provides a discrete optimization scheme to solve the proposed model, thereby effectively avoiding quantization errors.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention.
FIG. 1 is a flow chart of a method according to an embodiment of the present invention.
FIG. 2 is a flow chart of the SAZH model iterative update of the present invention.
Detailed Description
The following description is exemplary in nature and is intended to further illustrate the present invention and the accompanying drawings.
Example one
Fig. 1 is a flowchart of a cross-modal retrieval method based on similarity zero sample hash according to the present invention.
In this example, referring to fig. 1, the method of the present invention specifically comprises the following processes:
1. and acquiring a cross-modal data set, and performing feature extraction and class attribute vector extraction on the cross-modal data set. In this example, the data set used includes both image and text modalities, with labels corresponding one-to-one to them. In Step1, class attributes are extracted, and a Glove method is adopted to extract a corresponding word vector for each class name to form a class attribute matrix.
2. Processing of cross-modality data sets. Since the problem to be solved by the present invention is the zero-sample cross-modality retrieval problem, the acquired cross-modality dataset cannot be directly used. The data set should be processed according to the application scenario of the zero sample so as to conform to the application scenario of the zero sample cross-modal retrieval. The specific treatment method comprises the following steps:
the original data set is firstly divided into a training set and a query set, then 20% of classes are randomly selected from all classes of the original data set as invisible classes, and the rest classes are visible classes. For a zero sample cross-modal retrieval scene, the method takes a sample pair corresponding to an invisible class in an original query set as a new query set; taking the sample pairs corresponding to the visible classes in the original training set as a new training set; the search set is composed of the original training set.
In the present invention, a set of multiple modes is givenThe state data is:
Figure BDA0003702774900000051
wherein
Figure BDA0003702774900000052
Is a multi-modal data point that is,
Figure BDA0003702774900000053
the feature vector corresponding to the ith instance of the image modality,
Figure BDA0003702774900000054
feature vector, l, for the ith instance of the text modality i The ith instances of the two modalities correspond to a common label vector, and n is the total number of instances of the data set.
By processing and partitioning the data set, the invention is useful
Figure BDA0003702774900000055
To represent multi-modal data of a training set, where n s The number of training samples.
3. And intra-modal similarity, inter-modal similarity, label information, class attributes, hash codes and hash functions are fused and learned into the same frame, so that the target function is obtained, and the more discriminative hash codes are learned. The learning models of the respective modules will be described in detail below:
3.1 in-modality similarity learning
The intra-modal similarity is divided into a feature similarity calculated by Euclidean similarity and a semantic similarity measured by Jaccard similarity. Because the Euclidean distance is simple to calculate and reflects the distance between two vectors, the method adopts the Euclidean distance as a characteristic similarity measurement method. First of all, the first step is to,
Figure BDA0003702774900000056
and
Figure BDA0003702774900000057
the Euclidean distance between them is:
Figure BDA0003702774900000058
then
Figure BDA0003702774900000059
And
Figure BDA00037027749000000510
the similarity is as follows:
Figure BDA00037027749000000511
wherein the content of the first and second substances,
Figure BDA00037027749000000512
and
Figure BDA00037027749000000513
the ith and jth samples respectively representing the tth modality, and t ═ 1,2 indicates that two modalities are set forth in the present invention.
Furthermore, we measure semantic similarity with Jaccard similarity as follows:
Figure BDA00037027749000000514
wherein the content of the first and second substances,
Figure BDA00037027749000000515
corresponding to the number of tags assigned to the ith instance in the tth modality. The label of an instance depends on its features, and the semantic similarity is positively correlated with the feature similarity of the respective instance. Therefore, we can combine the feature similarity and semantic similarity between data to obtain the following learning model:
Figure BDA00037027749000000516
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA00037027749000000517
for the total similarity within a modality to be,
Figure BDA00037027749000000518
representing the similarity of features between two samples,
Figure BDA00037027749000000519
is semantic similarity measured by the Jaccard similarity method.
3.2, study of similarity between modalities
The inter-modality similarity refers to semantic similarity between instances of different modalities, and the semantic similarity is measured by label semantic information;
specifically, in the present invention, inter-modal similarity is calculated by a class label matrix. Is provided with
Figure BDA0003702774900000061
Is a corresponding label matrix, where L ij 1 represents X i* Belong to class j, otherwise 0. Furthermore, the inter-modal similarity matrix, expressed as
Figure BDA0003702774900000062
Can be constructed from a matrix of tags if
Figure BDA0003702774900000063
Description of X i* And X j* Are similar. Otherwise, X i* And X j* Are dissimilar. Where c represents the number of categories.
3.3 Hash function learning
The hash function in the present invention is learned by minimizing the following least squares regression problem:
Figure BDA0003702774900000064
wherein beta is a non-negative parameter, B 1 And B 2 Corresponding to two modalities, image and text, respectivelyHash code, W 1 And W 2 The projection matrixes respectively correspond to the image modality and the text modality.
3.4 similarity preserving learning
In combination with a kernel-based supervised hash (KSH) optimization model, the similarity-preserving learning model provided by the invention comprehensively considers intra-modal and inter-modal similarities. The model expression is as follows:
Figure BDA0003702774900000065
wherein S is 11 And S 22 Intra-modality similarity matrix, S, for two modalities, image and text respectively 12 Is an inter-modality similarity matrix for both image and text modalities.
3.5 class Attribute and tag embedding
The invention embeds the label information into the hash code, which is beneficial to generating optimized binary code by fully utilizing the label information and has stronger robustness when processing data in large scale. Thus, the optimized hash code is obtained by optimizing the following model:
Figure BDA0003702774900000066
wherein alpha is a non-negative parameter, C 1 And C 2 Respectively representing projection matrices for projecting the two modal hashcodes of image and text into the label.
In addition, class attribute information is added into the proposed model, which not only can be helpful for generating more discriminative hash codes, but also mainly realizes the transfer of attribute knowledge from visible classes to invisible classes, so that the problem of zero sample cross-modal retrieval is solved. The embedding of the class attribute information is performed by embedding a class attribute matrix corresponding to each class name into the projection matrix of formula (5). Therefore, the label information and the class attribute information can be simultaneously embedded into the learning of the hash code. Update equation (5) to:
Figure BDA0003702774900000071
wherein, V 1 And V 2 And respectively projecting the image and text two-mode hash codes to a conversion projection matrix added with class attribute information in a label.
3.6 objective function
By combining the steps, the objective function of the invention is obtained as follows:
Figure BDA0003702774900000072
wherein the content of the first and second substances,
Figure BDA0003702774900000073
a regularization term representing the model, with the purpose of preventing overfitting; γ is a parameter that controls the regularization term. X (1) And X (2) Feature matrixes of two modes of an image and a text are respectively set; y is a label matrix; a is a class attribute matrix; s 11 And S 22 Intra-modality similarity matrix, S, for two modalities, image and text respectively 12 An inter-modality similarity matrix which is an image modality and a text modality; w 1 、W 2 、V 1 、V 2 Is a projection matrix; α and β are non-negative parameters.
4. Performing iterative update of the objective function: and updating the target function obtained in the last step through iteration until the target function converges or reaches the maximum iteration times to obtain the hash function and the hash code of the training set.
The function (7) is not optimal and needs to be updated iteratively subsequently. It is clear that the overall objective function is a non-convex optimization problem. Therefore, we propose an efficient alternate iteration algorithm to solve this problem.
Specifically, referring to fig. 2, the optimization procedure for equation (7) is as follows:
B 1 -step: fixed variable W 1 ,W 2 ,V 1 ,V 2 ,B 2 Thus, for B 1 Equation (7) can be simplified as:
Figure BDA0003702774900000074
by setting B 1 Is zero, B can be deduced 1 The closed solution of (1). The following were used:
Figure BDA0003702774900000075
B 2 -step: and B 1 The updating steps are similar to obtain B 2 The closed solution of (2). The following were used:
Figure BDA0003702774900000081
V 1 -step: fixed variable W 1 ,W 2 ,V 2 ,B 1 ,B 2 Thus, for V 1 Equation (7) can be simplified as:
Figure BDA0003702774900000082
by setting V 1 Is zero, we can derive the following equation:
Figure BDA0003702774900000083
we define
Figure BDA0003702774900000084
B 11 =AA T
Figure BDA0003702774900000085
However, equation (12) can be rewritten as:
A 11 V 1 +V 1 B 11 =C 11 (13)
equation (13) is a Sylvester equation that can be solved using the Sylvester function in MATLAB.
V 2 -step: similarly, with respect to V 2 We have:
A 22 V 2 +V 2 B 22 =C 22 (14)
wherein the content of the first and second substances,
Figure BDA0003702774900000086
B 22 =AA T
Figure BDA0003702774900000087
W 1 -step: similarly, with respect to W 1 We have:
A 33 W 1 +W 1 B 33 =C 33 (15)
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003702774900000088
B 33 =B 1 T B 1
Figure BDA0003702774900000089
Figure BDA00037027749000000810
W 2 -step: similarly, with respect to W 2 We have:
A 44 W 2 +W 2 B 44 =C 44 (16)
wherein the content of the first and second substances,
Figure BDA00037027749000000811
B 44 =B 2 T B 2
Figure BDA00037027749000000812
Figure BDA00037027749000000813
and (4) optimizing the formula (7) through the steps until the function is converged or the maximum iteration number is reached, and stopping iteration.
5. And (3) inquiring, and carrying out zero sample cross-modal retrieval: the method comprises the steps of firstly obtaining a hash code corresponding to a retrieval set, inputting a query sample, and obtaining the hash code of the query sample according to the hash function obtained in the last step. And substituting the hash code of the query sample into the retrieval set for query. The specific implementation steps are as follows:
given that the query sample of images and text corresponds to a feature matrix of
Figure BDA0003702774900000091
And
Figure BDA0003702774900000092
combining the projection matrix W obtained in the previous step 1 And W 2 . By the formula
Figure BDA0003702774900000093
And
Figure BDA0003702774900000094
and obtaining the hash code corresponding to the query sample. In this embodiment, we perform two main retrieval tasks: image query text and text query images.
Because the query task of the invention is carried out in a binary space, the query result is obtained by calculating the Hamming distance between the query sample and each sample in the retrieval set. The sample corresponding to the minimum hamming distance in the search set is the query result we obtained.
In order to illustrate the effect of the present invention, the following further describes the technical solution of the present invention by specific examples:
1. simulation conditions
The invention adopts Matlab software to carry out experimental simulation. Experiments were performed on a cross-modal dataset Wiki (containing both image and text modalities), which included two query tasks: (1) text query image (Text2Img), (2) image query Text (Img2 Text). The parameters in the experiment are set to α ═ 1e-2, β ═ 1e5, and γ ═ 1 e-4.
2. Emulated content
Compared with the existing non-zero sample cross-modal Hash retrieval method and the zero sample cross-modal Hash retrieval method, the method provided by the invention comprises the following steps: (1) collaborative Matrix Factorization Hashing (CMFH), (2) Joint and Individual Matrix Factorization Hashing (JIMFH), (3) Discrete Robust Matrix Factorization Hashing (DRMFH), (4) Asymmetric Supervised Consistent and Specific Hashing (ASCSH), (5) label consistent matrix factorization hashing (LCMFFH); the zero-sample single-mode Hash retrieval method comprises the following steps: (1) zero sample hashing (TSK) based on supervised knowledge transfer, (2) attribute hashing Algorithm (AH) for zero sample image retrieval; the zero sample cross-modal Hash retrieval method comprises the following steps: (1) cross-modal attribute hashing (CMAH), (2) orthogonal hashing algorithm (CHOP) for zero sample cross-modal retrieval. For the zero-sample single-mode Hash retrieval method, Hash codes of two modes of an image and a text are obtained through a single-mode model respectively, and then the following query tasks are carried out.
3. Simulation result
The simulation experiment respectively provides the comparison method and the experimental result of the method provided by the invention under the data set Wiki. In order to meet the zero-sample cross-modal retrieval scenario, 20% of classes are selected as invisible classes in the random data set Wiki. The data set Wiki contains 8 classes in total, two classes are randomly selected from the data set Wiki as invisible classes according to experimental setting, and the processing mode of the rest data sets is the same as that of the data set of the invention.
In the simulation, a widely used index was used to measure the performance of the SASH method proposed by the present invention and other comparative methods. I.e. the average of the average precision (mAP). Given a query and a list of search results, the Average Precision (AP) is defined as:
Figure BDA0003702774900000101
where N is the number of relevant instances in the search set, p (r) is defined as the precision of the r-th search instance, and δ (r) is 1 if the r-th search instance is a real neighbor of the query; otherwise δ (r) is 0. Then, all queried APs are averaged to obtain the mAP. The evaluation rule is that the larger the mAP value is, the better the performance is.
The hash codes obtained from the simulation experiments have lengths of 8 bits, 12 bits, 16 bits and 32 bits, and the corresponding mAP values of the SAZH method proposed by the present invention and other comparison methods are shown in Table 1.
TABLE 1 mAP values on Text query image (Text2Img) task for all methods on Wiki dataset
Figure BDA0003702774900000102
TABLE 2 mAP values on image query (Img2Text) task for all methods on Wiki dataset
Figure BDA0003702774900000103
Figure BDA0003702774900000111
As can be seen from tables 1 and 2, the SAZH method proposed by the present invention has higher mAP values in both query tasks in the zero-sample cross-modal retrieval scenario of Wiki dataset than the other comparison methods. The superiority of the SAZH method provided by the invention in zero sample cross-modal retrieval is further proved.
The above-mentioned embodiments only express the specific embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.

Claims (8)

1. A cross-modal retrieval method based on similarity zero sample hash is characterized in that: the method comprises the following specific steps:
step1, acquiring a cross-modal data set, and extracting the characteristics and the class attributes of the cross-modal data set;
step2, processing of cross-modal data set: processing the existing cross-modal data set into a cross-modal zero sample data set;
step3, learning an objective function: the intra-modal similarity, the inter-modal similarity, the semantic label, the class attribute, the hash code and the hash function are fused and learned into the same frame, so that the target function is obtained, and the hash code with more discriminative performance is learned;
step4, performing iterative update of the objective function: iteratively updating the variable matrix in the target function obtained at Step3 until the target function converges or reaches the maximum iteration number, and obtaining a hash function and a hash code of the training set;
step 5: performing zero sample cross-modal retrieval: the method comprises the steps of firstly obtaining hash codes corresponding to a retrieval set, then solving the hash codes of a query set through a hash function obtained at Step4, putting the hash codes into the retrieval set for query, and obtaining a query result by calculating the Hamming distance between each sample in the query set and each sample in the retrieval set, wherein the minimum Hamming distance is the final query result.
2. The cross-modal retrieval method based on similarity zero-sample hashing according to claim 1, wherein: in Step1, class attributes are extracted, and a Glove method is adopted to extract a corresponding word vector for each class name to form a class attribute matrix.
3. The cross-modal retrieval method based on similarity zero-sample hash as claimed in claim 1, wherein: the specific method of Step2 comprises the following steps: firstly, dividing an original data set into a training set and a query set, then randomly selecting 20% of classes from all classes of the original data set as invisible classes, and selecting the rest classes as visible classes; for a zero sample cross-modal retrieval scene, taking a sample pair corresponding to an invisible class in an original query set as a new query set; taking the sample pair corresponding to the visible class in the original training set as a new training set; the search set is composed of the original training set.
4. The cross-modal retrieval method based on similarity zero-sample hashing according to claim 1, wherein: the intra-modal similarity in Step3 is divided into feature similarity calculated by Euclidean similarity and semantic similarity measured by Jaccard similarity.
5. The cross-modal retrieval method based on similarity zero-sample hashing according to claim 1, wherein: the inter-modality similarity in Step3 refers to the semantic similarity between instances of different modalities, and the semantic similarity is measured by label semantic information.
6. The cross-modal retrieval method based on similarity zero-sample hashing according to claim 1, wherein: the target function obtained in Step3 comprises two parts, namely hash code learning and a hash function, wherein the hash code learning refers to learning the hash code by combining intra-modal similarity, inter-modal similarity, semantic labels and class attributes; the learning of the hash function refers to learning the hash function through the least square regression problem, and the learning of the hash code and the learning of the hash function are put into the same model for learning, so that the semantic relation between the hash code and the hash function is enhanced, and the high-precision zero-sample cross-modal retrieval is realized.
7. The cross-modal retrieval method based on similarity zero-sample hashing according to claim 1, wherein: the iterative update of the objective function in Step4 is to update the objective function obtained in Step4 as an original function, obviously, the objective function is not optimal, and the function needs to be optimized, because the objective function is a non-convex problem, when other variables are fixed and a matrix variable is updated, the function is a convex problem, and the update of the objective function is facilitated; and updating the matrix variable by adopting the alternating iteration algorithm until the target function converges or the maximum iteration times is reached, and finally obtaining the optimal hash code and the optimal hash function.
8. The cross-modal retrieval method based on similarity zero-sample hashing according to claim 1, wherein: the objective function in Step3 is:
Figure FDA0003702774890000021
wherein the content of the first and second substances,
Figure FDA0003702774890000022
a regularization term representing a model for preventing overfitting; gamma is a parameter controlling the regularization term, X (1) And X (2) Feature matrixes of two modes of images and texts are respectively used; y is a label matrix; a is a class attribute matrix; s 11 And S 22 Intra-modality similarity matrix, S, for two modalities, image and text respectively 12 An inter-modality similarity matrix which is two modalities of an image and a text; w 1 、W 2 、V 1 、V 2 Is a projection matrix; alpha and beta are non-negative parameters, n s The number of training samples.
CN202210696434.4A 2022-06-20 2022-06-20 Cross-modal retrieval method based on similarity zero sample hash Active CN114943017B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210696434.4A CN114943017B (en) 2022-06-20 2022-06-20 Cross-modal retrieval method based on similarity zero sample hash

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210696434.4A CN114943017B (en) 2022-06-20 2022-06-20 Cross-modal retrieval method based on similarity zero sample hash

Publications (2)

Publication Number Publication Date
CN114943017A true CN114943017A (en) 2022-08-26
CN114943017B CN114943017B (en) 2024-06-18

Family

ID=82911208

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210696434.4A Active CN114943017B (en) 2022-06-20 2022-06-20 Cross-modal retrieval method based on similarity zero sample hash

Country Status (1)

Country Link
CN (1) CN114943017B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116244484A (en) * 2023-05-11 2023-06-09 山东大学 Federal cross-modal retrieval method and system for unbalanced data
CN116244483A (en) * 2023-05-12 2023-06-09 山东建筑大学 Large-scale zero sample data retrieval method and system based on data synthesis
CN117992805A (en) * 2024-04-07 2024-05-07 武汉商学院 Zero sample cross-modal retrieval method and system based on tensor product graph fusion diffusion

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110059198A (en) * 2019-04-08 2019-07-26 浙江大学 A kind of discrete Hash search method across modal data kept based on similitude
CN111460077A (en) * 2019-01-22 2020-07-28 大连理工大学 Cross-modal Hash retrieval method based on class semantic guidance
CN112364195A (en) * 2020-10-22 2021-02-12 天津大学 Zero sample image retrieval method based on attribute-guided countermeasure hash network
CN113342922A (en) * 2021-06-17 2021-09-03 北京邮电大学 Cross-modal retrieval method based on fine-grained self-supervision of labels

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460077A (en) * 2019-01-22 2020-07-28 大连理工大学 Cross-modal Hash retrieval method based on class semantic guidance
CN110059198A (en) * 2019-04-08 2019-07-26 浙江大学 A kind of discrete Hash search method across modal data kept based on similitude
CN112364195A (en) * 2020-10-22 2021-02-12 天津大学 Zero sample image retrieval method based on attribute-guided countermeasure hash network
CN113342922A (en) * 2021-06-17 2021-09-03 北京邮电大学 Cross-modal retrieval method based on fine-grained self-supervision of labels

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XUANWU LIU等: "Cross modal zero shot hashing", 《2019 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM)》, 30 January 2020 (2020-01-30), pages 1 - 9 *
庾骏: "跨模态哈希学习算法及其应用研究", 《中国博士学位论文全文数据库 信息科技辑》, 15 April 2021 (2021-04-15), pages 140 - 12 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116244484A (en) * 2023-05-11 2023-06-09 山东大学 Federal cross-modal retrieval method and system for unbalanced data
CN116244484B (en) * 2023-05-11 2023-08-08 山东大学 Federal cross-modal retrieval method and system for unbalanced data
CN116244483A (en) * 2023-05-12 2023-06-09 山东建筑大学 Large-scale zero sample data retrieval method and system based on data synthesis
CN117992805A (en) * 2024-04-07 2024-05-07 武汉商学院 Zero sample cross-modal retrieval method and system based on tensor product graph fusion diffusion

Also Published As

Publication number Publication date
CN114943017B (en) 2024-06-18

Similar Documents

Publication Publication Date Title
WO2022068196A1 (en) Cross-modal data processing method and device, storage medium, and electronic device
WO2023000574A1 (en) Model training method, apparatus and device, and readable storage medium
CN110070909B (en) Deep learning-based multi-feature fusion protein function prediction method
CN114943017B (en) Cross-modal retrieval method based on similarity zero sample hash
CN110674323B (en) Unsupervised cross-modal Hash retrieval method and system based on virtual label regression
CN112131404A (en) Entity alignment method in four-risk one-gold domain knowledge graph
WO2022068195A1 (en) Cross-modal data processing method and device, storage medium and electronic device
CN110347932B (en) Cross-network user alignment method based on deep learning
Saito et al. Robust active learning for the diagnosis of parasites
CN113177132B (en) Image retrieval method based on depth cross-modal hash of joint semantic matrix
CN112364174A (en) Patient medical record similarity evaluation method and system based on knowledge graph
CN109376796A (en) Image classification method based on active semi-supervised learning
CN110647904A (en) Cross-modal retrieval method and system based on unmarked data migration
CN111080551B (en) Multi-label image complement method based on depth convolution feature and semantic neighbor
US20200320440A1 (en) System and Method for Use in Training Machine Learning Utilities
CN112199532A (en) Zero sample image retrieval method and device based on Hash coding and graph attention machine mechanism
CN113806582B (en) Image retrieval method, image retrieval device, electronic equipment and storage medium
CN113378938B (en) Edge transform graph neural network-based small sample image classification method and system
Amiri et al. Automatic image annotation using semi-supervised generative modeling
CN114093445B (en) Patient screening marking method based on partial multi-marking learning
Bhardwaj et al. Computational biology in the lens of CNN
Zhou et al. Unsupervised multiple network alignment with multinominal gan and variational inference
CN114579794A (en) Multi-scale fusion landmark image retrieval method and system based on feature consistency suggestion
CN113535947A (en) Multi-label classification method and device for incomplete data with missing labels
CN114764865A (en) Data classification model training method, data classification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant