CN109284414A - The cross-module state content search method and system kept based on semanteme - Google Patents

The cross-module state content search method and system kept based on semanteme Download PDF

Info

Publication number
CN109284414A
CN109284414A CN201811156579.5A CN201811156579A CN109284414A CN 109284414 A CN109284414 A CN 109284414A CN 201811156579 A CN201811156579 A CN 201811156579A CN 109284414 A CN109284414 A CN 109284414A
Authority
CN
China
Prior art keywords
sample
mode
node
mapping function
mode sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811156579.5A
Other languages
Chinese (zh)
Other versions
CN109284414B (en
Inventor
王树徽
吴益灵
黄庆明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201811156579.5A priority Critical patent/CN109284414B/en
Publication of CN109284414A publication Critical patent/CN109284414A/en
Application granted granted Critical
Publication of CN109284414B publication Critical patent/CN109284414B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of cross-module state content search methods kept based on semanteme, comprising: constructs fisrt feature figure and second feature figure respectively by node of the feature vector of first mode sample and second mode sample;The label vector for extracting all samples is that node constructs grapheme;Obtain the neighbor node of each node;The first mapping function and the second mapping function for first mode sample and second mode sample to be mapped as to implicit expression are constructed respectively;Mapping function is learnt, approximation maximizes the likelihood that the neighbor node of each node occurs, and each implicit expression is allowed to rebuild the corresponding label information of corresponding node;With the first mapping function by sample retrieval be mapped as retrieving it is implicit indicate, and each second mode sample is mapped as to the second mapping function target is implicit to be indicated;Acquisition retrieval is implicit to be indicated at a distance from the implicit expression of each target, using the corresponding second mode sample of all distances less than retrieval threshold as search result.

Description

The cross-module state content search method and system kept based on semanteme
Technical field
The present invention relates to the cross-module state retrieval techniques of MultiMedia Field, in particular to cross-module state content retrieval technology.
Background technique
With the development of multimedia technology, the data of various mode are widely present in internet.The retrieval of cross-module state is more One of important subject of field of media.Traditional single mode searching system, query sample and search result are confined to single Mode is not able to satisfy the growing demand of user.Cross-module state searching system is then different from single mode searching system, inquires sample This and search result are belonging respectively to different mode, such as image, video, audio data sample is used to retrieve as query sample Content of text.Cross-module state retrieval technique provides more convenient and fast retrieval mode for user, and user is facilitated to obtain a variety of moulds needed The information of state, improves user experience.Because query sample and search result are belonging respectively to different mode, how from semantically The similarity for comparing different modalities sample is good problem to study.
Since different mode has heterogeneity, the key of cross-module state retrieval is how to be associated with different modalities.Currently, absolutely The sample of different modalities is mapped to low-dimensional and implied in space by most of cross-module state searching algorithms.According to the implicit expression learnt Classification, can be divided into real number representation cross-module state search method and binary representation cross-module state search method.According to these method institutes The information classification used, can be divided into non-supervisory method and have measure of supervision.Non-supervisory method is total using only different modalities sample Existing information, the label information for having measure of supervision that sample has been used to have.In general, the information used is more, the cross-module state The effect of searching algorithm is better.
The high-layer semantic information that label information can be used as instructs the foundation of relationship between different modalities sample, although different moulds The sample of state has different feature spaces, but they have identical Label space.In the existing method, label information is used as In addition a mode, perhaps for calculating similarity associated images text pair or as the expression in implicit space.It is existing Method, using relatively simple, between consideration mode association, does not account for the association in mode, but mode for label information Interior related information is vital.In same mode, the sample with similar semantic is implicit to indicate similar, in addition mode Between the sample with similar semantic is implicit indicates similar, it is similar that the similitude of consistency ensure that the similar sample of all semantemes has Implicit expression.It is contemplated that one grapheme comprising all samples of creation provides high-level semantic constraint, in addition two include The characteristic pattern of respective mode sample provides manifold constraint, and rebuilds label information and provide global semantic constraint.In addition, traditional base Need to create complexity O (M when node quantity is M in the method for figure2) figure, solve and need complicated Eigenvalues Decomposition mistake Journey needs highly efficient algorithm to the study of graph structure.
Summary of the invention
In view of the above-mentioned problems, the invention discloses a kind of cross-module state content search method kept based on semanteme and system, Include: that retrieved set is constructed with first mode sample, object set is constructed with second mode sample;Extract the spy of the first mode sample Levying vector is that node constructs fisrt feature figure;The feature vector for extracting the second mode sample is that node constructs second feature figure; Extracting the retrieved set and neutralizing this target tightening the label vector of the label information of all samples is that node constructs grapheme;It obtains every The neighbor node of a node;The first mode sample for being mapped as the first mapping function of implicit expression, Yi Jiyong by building In the second mapping function that the second mode sample is mapped as to implicit expression;To first mapping function and the second mapping letter Number is learnt, and approximation maximizes the likelihood that the neighbor node of each node occurs, and each implicit expression is weighed Build the corresponding label information of corresponding node;It, should by the first mapping function using a certain first mode sample as sample retrieval Sample retrieval is mapped as retrieving implicit expression, and each second mode sample is mapped as target by the second mapping function and is implied It indicates;The implicit expression of the retrieval is obtained at a distance from the implicit expression of each target, with all distances less than retrieval threshold The corresponding second mode sample is the search result of the sample retrieval.
Cross-module state content search method of the present invention, wherein sampling is sampled and born using neighbours to the first mapping letter Several and second mapping function is learnt, and establishes multinomial point according to the weight on sampling nodes to the side of node adjacent thereto Cloth, from the multinomial distribution sampling with the sampling nodes have connection node be neighbor node, and with by be uniformly distributed selection with The connectionless node of the sampling nodes is negative nodal point.
Cross-module state content search method of the present invention, wherein the distance, which is that the retrieval is implicit, indicates implicit with the target Euclidean distance d (x between expressioni,xj)=(xi-xj)2Or COS distanceWherein, xiFor this Retrieval is implicit to be indicated, xjIt is indicated for the target is implicit.
Cross-module state content search method of the present invention, wherein the mode of the first mode sample include visual modalities, Audio modality, text modality, the mode of the second mode sample include visual modalities, audio modality, text modality.
Cross-module state content search method of the present invention, wherein if the first mode sample and/or the second mode sample This mode is visual modalities, then the feature vector of the first mode sample and/or the second mode sample is that Scale invariant is special Levy transform characteristics or visual modalities convolutional neural networks feature or histograms of oriented gradients feature;If the first mode sample And/or the mode of the second mode sample is text modality, the then feature of the first mode sample and/or the second mode sample Vector is word frequency-inverse file frequecy characteristic or text modality depth convolution/recurrent neural network feature.
The invention also discloses a kind of cross-module state content retrieval systems kept based on semanteme, comprising:
Sample set constructs module, for constructing retrieved set with first mode sample, and constructs target with second mode sample Collection;
Characteristic pattern constructs module, for constructing fisrt feature figure and second feature figure and grapheme, and obtains first spy The neighbor node of each node in sign figure and the second feature figure;The feature vector for wherein extracting the first mode sample is node The fisrt feature figure is constructed, the feature vector for extracting the second mode sample is that node constructs the second feature figure, extracts the inspection Rope, which integrates, to be neutralized this target tightening the label vector of the label information of all samples and construct the grapheme as node;Obtain each node Neighbor node;
Mapping function study module, for constructing mapping function and learning to the mapping function;Wherein building is used for The first mode sample is mapped as to the first mapping function of implicit expression, and hidden for the second mode sample to be mapped as The second mapping function containing expression;First mapping function and second mapping function are learnt, approximation maximizes each The likelihood that the neighbor node of the node occurs, and each implicit expression is allowed to rebuild the corresponding label information of corresponding node;
Sample searching module, for obtaining search result;It is wherein sample retrieval by a certain first mode sample, leads to Cross first mapping function and the sample retrieval be mapped as retrieving implicit expression, and with second mapping function will it is each this second Mode sample is mapped as the implicit expression of target;The implicit expression of the retrieval is obtained at a distance from the implicit expression of each target, with institute Have less than retrieval threshold this apart from the corresponding second mode sample be the sample retrieval search result.
Cross-module state content retrieval system of the present invention, wherein the mapping function study module includes:
Neighbours' sampling module, for being sampled using neighbours to first mapping function and second mapping function It practises;Multinomial distribution is wherein established according to the weight on sampling nodes to the side of node adjacent thereto, is adopted from the multinomial distribution The node that sample and the sampling nodes have connection is neighbor node;
Negative sampling module learns first mapping function and second mapping function using negative sampling;Wherein from Multinomial distribution sampling is negative nodal point with the connectionless node of the sampling nodes.
Cross-module state content retrieval system of the present invention, wherein the retrieval and result obtain in module, which is The implicit Euclidean distance d (x indicated between the implicit expression of the target of the retrievali,xj)=(xi-xj)2Or COS distanceWherein, xiIt is indicated for the retrieval is implicit, xjIt is indicated for the target is implicit.
Cross-module state content retrieval system of the present invention, wherein the mode of the first mode sample include visual modalities, Audio modality, text modality, the mode of the second mode sample include visual modalities, audio modality, text modality.
Cross-module state content retrieval system of the present invention, wherein if the first mode sample and/or the second mode sample This mode is visual modalities, then the feature vector of the first mode sample and/or the second mode sample is that Scale invariant is special Levy transform characteristics or visual modalities convolutional neural networks feature or histograms of oriented gradients feature;If the first mode sample And/or the mode of the second mode sample is text modality, the then feature of the first mode sample and/or the second mode sample Vector is word frequency-inverse file frequecy characteristic or text modality depth convolution/recurrent neural network feature.
Detailed description of the invention
Fig. 1 is the cross-module state content search method flow chart of the embodiment of the present invention kept based on semanteme.
Fig. 2 is the cross-module state content search method characteristic pattern of the embodiment of the present invention kept based on semanteme and showing for grapheme It is intended to.
Fig. 3 is the mapping function schematic diagram for the cross-module state content search method of the embodiment of the present invention kept based on semanteme.
Fig. 4 is the cross-module state content retrieval system schematic diagram of the embodiment of the present invention kept based on semanteme.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing, the present invention is mentioned A kind of cross-module state content search method and system based on semanteme holding out is further described.It should be appreciated that this place The specific implementation method of description is only used to explain the present invention, is not intended to limit the present invention.
The invention proposes a kind of cross-module state search methods kept based on semanteme, are related to multiple modalities.For convenience of description, The embodiment of the present invention only relates to two mode of text and image, it is to be understood that, cross-module state content inspection according to the present invention Suo Fangfa is widely portable to the mode such as text, vision, the sense of hearing and such as video is multi-modal, and is not limited to above-mentioned mode. Cross-module state search method according to the present invention is roughly divided into three steps, and original is extracted by the way of feature extraction to each sample first Then beginning feature learns mapping function for each sample and is mapped to implicit expression from primitive character, it is hidden finally to calculate sample retrieval Containing indicating that sample implies the distance indicated with target tightening, by distance-taxis, selection is with sample retrieval apart from the mesh less than threshold value Mark collection sample is as search result.
Fig. 1 is the cross-module state content search method flow chart of the embodiment of the present invention kept based on semanteme.As shown in Figure 1, In an embodiment of the present invention, the cross-module state search method kept based on semanteme is specifically included:
Step S1 is constructed with retrieved set and object set, wherein the sample standard deviation of retrieved set has first mode, referred to as the One mode sample, the sample of object set then all have second mode, referred to as second mode sample, first mode sample and second The mode of mode sample includes visual modalities, audio modality, text modality etc., or including visual modalities and audio modality Multi-modal, such as video modality etc., the present invention is not limited thereto;First mode sample and second mode sample have difference Mode, in an embodiment of the present invention, first mode is image modalities, and second mode is text modality;
Step S2, the feature vector for extracting all first mode samples is that node constructs fisrt feature figure;Extract all The feature vector of two mode samples is that node constructs second feature figure;Extract all first mode samples and second mode sample language The label information of adopted label is label vector, constructs grapheme by node of each label vector;In an embodiment of the present invention, When first mode sample is image pattern, and second mode sample is samples of text, image pattern and samples of text are extracted first Feature vector;Wherein the feature vector of image pattern can choose such as SIFT (Scale invariant features transform Scale- Invariant feature transform) feature or visual modalities CNN (convolutional neural networks Convolution Neural Network) feature or HOG (histograms of oriented gradients Histogram ofOriented Gradient) feature Deng the feature vector of samples of text can use TF-IDF (word frequency-inverse file frequency term frequency-inverse Document frequency) feature or text modality CNN (convolutional neural networks Convolution Neural Network) CNN/RNN (the depth convolution/recurrent neural network RecurrentNeural of feature or text modality Network) feature, the present invention is not limited thereto;
Fig. 2 is the cross-module state content search method characteristic pattern of the embodiment of the present invention kept based on semanteme and showing for grapheme It is intended to.As shown in Fig. 2, establishing three figures respectively using first mode sample and second mode sample, comprising: grapheme Gs, the One characteristic pattern (characteristics of image figure Gt), second feature figure (text feature figure Gi), all languages by samples of text and image pattern The adopted extracted label vector of label is all a node in grapheme Gs;
Step S3 is schemed by three, obtains the neighbor node of each node;Because grapheme Gs contain samples of text and The semantic label of image pattern, so the semantic information between containing mode and in mode;Wherein three figures refer to grapheme, One characteristic pattern and second feature figure;It is established using image pattern and the label information of samples of text as label vector each in grapheme The connection of node, is divided into two methods:
First method is, the label vector and if only if two nodes in grapheme has the value of at least one identical dimensional All to be non-zero, then a line is established among the two nodes, vector similarity is calculated as side between node according to label vector Weight, cosine similarity can be usedOr use index similarityHere zi、zjIt is the label vector of node i, j respectively, σ is spread factor;
Second method establishes the connection of each node of grapheme using existing knowledge mapping, for example, find image pattern and The label of samples of text corresponding concept in word net (WordNet) uses entity in the calculation knowledges map such as such as shortest path Similarity, the weight as side between node in grapheme;The case where for multi-tag, needs to the phase between all labels It is averaged like degree, the weight as side in grapheme Gs;In fisrt feature figure (characteristics of image figure), arbitrary two are tied Point calculates distance with the feature vector of image, if a node is k neighbour's node of another node, the two nodes Between have a connection, and the weight on side is 1;In second feature figure (text feature figure), for arbitrary two nodes, text is used Feature vector calculate distance and if a node is k neighbour's node of another node have company between the two nodes It connects, and the weight on side is 1;
Step S4 constructs the first mapping function for first mode sample to be mapped as to implicit expression, and constructs and be used for Second mode sample is mapped as to the second mapping function of implicit expression;To the first mapping function and the second mapping function It practises, approximation maximizes the likelihood that the neighbor node of each node occurs, and each implicit expression is allowed to rebuild corresponding node Corresponding label information;For image pattern vi, implicit to be expressed as fv(vi), for samples of text ti, implicit to be expressed as ft(ti), F (n is collectively expressed as to both implicit expressionsi), niFor image pattern or samples of text;In order to keep grapheme, the first spy The partial structurtes of sign figure and second feature figure, the neighbor node that the present invention maximizes each node to each figure respectively occur general Rate;
For node niOne neighbours sample set P (n of samplingi), maximize probabilityHere V is that own in grapheme, fisrt feature figure and second feature figure The set of node, P (ni) indicate node niThe corresponding sample of neighbor node, that is, neighbours' sample, T indicate vector transposition;
When node is large number of, negative sample is sampled by aforementioned probability P r (P (ni)|ni) relax to minimize lossN(ni) indicate node ni Negative sample;Neighbours' sample is sampled by neighbours and is obtained, i.e., according to each neighbor node to node niSide weight establish it is multinomial Formula distribution samples neighbor node from the multinomial distribution;Negative sampling obtains negative sample, i.e., then by being uniformly distributed selection and niWithout even The node connect negative sample the most;In three figures, partial structurtes are all guaranteed using similar neighbours' sampling and negative sample, in above formula G can be grapheme Gs, text feature figure GiOr characteristics of image figure Gt, i.e., to image pattern viIt is availableTo samples of text tiIt is available
In addition, introducing global semantic holding condition, that is, the implicit expression needs mapped can recover semantic label letter Breath;Enabling g () is from the implicit function indicated to semantic label, the global semantic loss kept are as follows:
WhereinIt is node niSemantic label;
In general, for image pattern vi, the loss of optimization are as follows:Wherein α and β are Coefficient of balance;Similar, for samples of text ti, the loss of optimization are as follows:
In order to model the non-linear relation between primitive character and implicit expression, the present invention uses the structure of neural network. Fig. 3 is the mapping function schematic diagram for the cross-module state content search method of the embodiment of the present invention kept based on semanteme.Such as Fig. 3 institute Show, fv(·)、ftText and image are mapped to unified implicit representation space by (), are mapped to semantic label by g () later Space, for different concrete application situations, the form of network can be different, such as fv(·)、ftThe number of plies of (), g () can To increase or reduce;Finally, optimizing loss function, study mapping using stochastic gradient descent method and error backpropagation algorithm Function;
Step S5 finds out the implicit expression of each sample according to the mapping function learnt;Some given is located at First mode sample (sample retrieval) in retrieved set, calculating it implicit indicates hidden with target tightening each second mode sample Euclidean distance d (x can be used in distance containing expression, distance described herei,xj)=(xi-xj)2, also can be used cosine away from FromThe present invention is not limited thereto, wherein xi, xjRespectively indicate the implicit of first mode sample Indicate the implicit expression with second mode sample;It is ranked up the distance of all acquisitions is ascending, according to preset Retrieval threshold N selects search result of the second mode sample of preceding N in distance sequence as sample retrieval.
The invention also discloses a kind of cross-module state content retrieval systems kept based on semanteme.Fig. 4 is the embodiment of the present invention Based on semanteme keep cross-module state content retrieval system schematic diagram.As shown in figure 4, cross-module state content retrieval system of the invention It include: sample set building module, characteristic pattern building module, mapping function study module and sample searching module, wherein sample set Module is constructed for constructing with retrieved set and object set, the sample standard deviation of retrieved set kind has first mode, referred to as first mode Sample, the sample that target tightening then all have second mode, referred to as second mode sample;Characteristic pattern constructs module, for mentioning The feature vector for taking all first mode samples is that node constructs fisrt feature figure, extract the features of all second mode samples to Amount is that node constructs second feature figure, and extracts the label of the label information of all first mode samples and second mode sample Vector is that node constructs grapheme, and obtains the neighbor node of each node;Mapping function study module is reflected for constructing first Function and the second mapping function are penetrated, wherein the first mapping function is used to for first mode sample to be mapped as implicit expression, second reflects Function is penetrated for second mode sample to be mapped as implicit expression, by the first mapping function and the second mapping function It practises, approximation maximizes the likelihood that the neighbor node of each node occurs, and each implicit expression is allowed to rebuild corresponding node Corresponding label information;Sample searching module, for obtaining search result, wherein using certain first mode sample as sample retrieval, Sample retrieval is mapped as by the first mapping function to retrieve implicit expression, and every by what target tightening by the second mapping function A second mode sample is mapped as the implicit expression of target, and acquisition retrieval is implicit to be indicated to imply at a distance from expression with each target, will The distance of all acquisitions is ascending to be ranked up, and according to preset retrieval threshold N, selects the of preceding N in distance sequence Search result of the two mode samples as sample retrieval.

Claims (10)

1. a kind of cross-module state content search method kept based on semanteme characterized by comprising
Retrieved set is constructed with first mode sample, object set is constructed with second mode sample;
The feature vector for extracting the first mode sample is that node constructs fisrt feature figure;Extract the feature of the second mode sample Vector is that node constructs second feature figure;Extract the retrieved set neutralize this target tightening all samples label information label to Amount is that node constitutes grapheme;Obtain the neighbor node of each node;
Building implies the first mapping function of expression for the first mode sample to be mapped as, and is used for the second mode Sample is mapped as the second mapping function of implicit expression;First mapping function and second mapping function are learnt, closely Like the likelihood that the neighbor node for maximizing each node occurs, and each implicit expression is allowed to rebuild the correspondence of corresponding node Label information;
Using a certain first mode sample as sample retrieval, which is mapped as by retrieval by the first mapping function and is implied It indicates, each second mode sample is mapped as by the implicit expression of target by the second mapping function;It obtains the retrieval and implies table Show with each target is implicit indicate at a distance from, be apart from the corresponding second mode sample with these all less than retrieval threshold The search result of the sample retrieval.
2. cross-module state content search method as described in claim 1, which is characterized in that sampled using neighbours and negative sampling is to this First mapping function and second mapping function are learnt, and are built according to the weight on sampling nodes to the side of node adjacent thereto Vertical multinomial distribution, the node for having connection with the sampling nodes from multinomial distribution sampling is neighbor node, and by uniform Distribution selects with the connectionless node of the sampling nodes to be negative nodal point.
3. cross-module state content search method as described in claim 1, which is characterized in that the distance be the retrieval it is implicit indicate with Euclidean distance d (x between the implicit expression of the targeti,xj)=(xi-xj)2Or COS distance Wherein, xiIt is indicated for the retrieval is implicit, xjIt is indicated for the target is implicit.
4. cross-module state content search method as described in claim 1, which is characterized in that the mode of the first mode sample includes Visual modalities, audio modality, text modality, the mode of the second mode sample include visual modalities, audio modality, text mould State.
5. cross-module state content search method as claimed in claim 4, which is characterized in that if the first mode sample and/or should The mode of second mode sample is visual modalities, then the feature vector of the first mode sample and/or the second mode sample is Scale invariant features transform feature or visual modalities convolutional neural networks feature or histograms of oriented gradients feature;If this first The mode of mode sample and/or the second mode sample is text modality, then the first mode sample and/or the second mode sample This feature vector is word frequency-inverse file frequecy characteristic or text modality depth convolution/recurrent neural network feature.
6. a kind of cross-module state content retrieval system kept based on semanteme characterized by comprising
Sample set constructs module, for constructing retrieved set with first mode sample, and constructs object set with second mode sample;
Characteristic pattern constructs module, for constructing fisrt feature figure and second feature figure and grapheme, and obtains the fisrt feature figure With the neighbor node of each node in the second feature figure;Extract the feature vector of the first mode sample wherein as node building The fisrt feature figure, the feature vector for extracting the second mode sample is that node constructs the second feature figure, extracts the retrieved set The label vector for neutralizing this target tightening the label information of all samples is that node constructs the grapheme;Obtain the neighbour of each node Occupy node;
Mapping function study module, for constructing mapping function and learning to the mapping function;Wherein building was for should First mode sample is mapped as the first mapping function of implicit expression, and for the second mode sample to be mapped as implicit table The second mapping function shown;First mapping function and second mapping function are learnt, approximation maximizes each node The likelihood that occurs of neighbor node, and each implicit expression is allowed to rebuild the corresponding label information of corresponding node;
Sample searching module, for obtaining search result;It is wherein sample retrieval by a certain first mode sample, by this The sample retrieval is mapped as retrieving implicit expression by the first mapping function, and will each second mode with second mapping function Sample is mapped as the implicit expression of target;The implicit expression of the retrieval is obtained at a distance from the implicit expression of each target, with all small In retrieval threshold this apart from the corresponding second mode sample be the sample retrieval search result.
7. cross-module state content retrieval system as claimed in claim 6, which is characterized in that the mapping function study module packet It includes:
Neighbours' sampling module, for being learnt using neighbours' sampling to first mapping function and second mapping function;Its The weight on the middle side according to sampling nodes to node adjacent thereto establishes multinomial distribution, samples and is somebody's turn to do from the multinomial distribution The node that sampling nodes have connection is neighbor node;
Negative sampling module carries out approximate study to first mapping function and second mapping function using negative sampling;Wherein from Multinomial distribution sampling is negative nodal point with the connectionless node of the sampling nodes.
8. cross-module state content retrieval system as claimed in claim 6, which is characterized in that the retrieval and result obtain module In, which is the implicit Euclidean distance d (x indicated between the implicit expression of the target of the retrievali,xj)=(xi-xj)2Or cosine DistanceWherein, xiIt is indicated for the retrieval is implicit, xjIt is indicated for the target is implicit.
9. cross-module state content retrieval system as claimed in claim 6, which is characterized in that the mode of the first mode sample includes Visual modalities, audio modality, text modality, the mode of the second mode sample include visual modalities, audio modality, text mould State.
10. cross-module state content retrieval system as claimed in claim 9, which is characterized in that if the first mode sample and/or should The mode of second mode sample is visual modalities, then the feature vector of the first mode sample and/or the second mode sample is Scale invariant features transform feature or visual modalities convolutional neural networks feature or histograms of oriented gradients feature;If this first The mode of mode sample and/or the second mode sample is text modality, then the first mode sample and/or the second mode sample This feature vector is word frequency-inverse file frequecy characteristic or text modality depth convolution/recurrent neural network feature.
CN201811156579.5A 2018-09-30 2018-09-30 Cross-modal content retrieval method and system based on semantic preservation Active CN109284414B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811156579.5A CN109284414B (en) 2018-09-30 2018-09-30 Cross-modal content retrieval method and system based on semantic preservation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811156579.5A CN109284414B (en) 2018-09-30 2018-09-30 Cross-modal content retrieval method and system based on semantic preservation

Publications (2)

Publication Number Publication Date
CN109284414A true CN109284414A (en) 2019-01-29
CN109284414B CN109284414B (en) 2020-12-04

Family

ID=65182054

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811156579.5A Active CN109284414B (en) 2018-09-30 2018-09-30 Cross-modal content retrieval method and system based on semantic preservation

Country Status (1)

Country Link
CN (1) CN109284414B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886326A (en) * 2019-01-31 2019-06-14 深圳市商汤科技有限公司 A kind of cross-module state information retrieval method, device and storage medium
CN110222560A (en) * 2019-04-25 2019-09-10 西北大学 A kind of text people search's method being embedded in similitude loss function
CN111813967A (en) * 2020-07-14 2020-10-23 中国科学技术信息研究所 Retrieval method, retrieval device, computer equipment and storage medium
CN112100410A (en) * 2020-08-13 2020-12-18 中国科学院计算技术研究所 Cross-modal retrieval method and system based on semantic condition association learning

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7089543B2 (en) * 2001-07-13 2006-08-08 Sony Corporation Use of formal logic specification in construction of semantic descriptions
WO2010120941A3 (en) * 2009-04-15 2011-01-20 Evri Inc. Automatic mapping of a location identifier pattern of an object to a semantic type using object metadata
CN103049526A (en) * 2012-12-20 2013-04-17 中国科学院自动化研究所 Cross-media retrieval method based on double space learning
CN105205096A (en) * 2015-08-18 2015-12-30 天津中科智能识别产业技术研究院有限公司 Text modal and image modal crossing type data retrieval method
US20170116521A1 (en) * 2015-10-27 2017-04-27 Beijing Baidu Netcom Science And Technology Co., Ltd. Tag processing method and device
CN107273517A (en) * 2017-06-21 2017-10-20 复旦大学 Picture and text cross-module state search method based on the embedded study of figure
CN107330100A (en) * 2017-07-06 2017-11-07 北京大学深圳研究生院 Combine the two-way search method of image text of embedded space based on multi views
CN107633263A (en) * 2017-08-30 2018-01-26 清华大学 Network embedding grammar based on side

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7089543B2 (en) * 2001-07-13 2006-08-08 Sony Corporation Use of formal logic specification in construction of semantic descriptions
WO2010120941A3 (en) * 2009-04-15 2011-01-20 Evri Inc. Automatic mapping of a location identifier pattern of an object to a semantic type using object metadata
CN103049526A (en) * 2012-12-20 2013-04-17 中国科学院自动化研究所 Cross-media retrieval method based on double space learning
CN105205096A (en) * 2015-08-18 2015-12-30 天津中科智能识别产业技术研究院有限公司 Text modal and image modal crossing type data retrieval method
US20170116521A1 (en) * 2015-10-27 2017-04-27 Beijing Baidu Netcom Science And Technology Co., Ltd. Tag processing method and device
CN107273517A (en) * 2017-06-21 2017-10-20 复旦大学 Picture and text cross-module state search method based on the embedded study of figure
CN107330100A (en) * 2017-07-06 2017-11-07 北京大学深圳研究生院 Combine the two-way search method of image text of embedded space based on multi views
CN107633263A (en) * 2017-08-30 2018-01-26 清华大学 Network embedding grammar based on side

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YILING,WU等: ""Online Asymmetric Similarity Learning for Cross-Modal Retrieval"", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
王树徽等: ""异质媒体分析技术研究进展"", 《集成技术》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886326A (en) * 2019-01-31 2019-06-14 深圳市商汤科技有限公司 A kind of cross-module state information retrieval method, device and storage medium
CN109886326B (en) * 2019-01-31 2022-01-04 深圳市商汤科技有限公司 Cross-modal information retrieval method and device and storage medium
CN110222560A (en) * 2019-04-25 2019-09-10 西北大学 A kind of text people search's method being embedded in similitude loss function
CN110222560B (en) * 2019-04-25 2022-12-23 西北大学 Text person searching method embedded with similarity loss function
CN111813967A (en) * 2020-07-14 2020-10-23 中国科学技术信息研究所 Retrieval method, retrieval device, computer equipment and storage medium
CN111813967B (en) * 2020-07-14 2024-01-30 中国科学技术信息研究所 Retrieval method, retrieval device, computer equipment and storage medium
CN112100410A (en) * 2020-08-13 2020-12-18 中国科学院计算技术研究所 Cross-modal retrieval method and system based on semantic condition association learning

Also Published As

Publication number Publication date
CN109284414B (en) 2020-12-04

Similar Documents

Publication Publication Date Title
CN108132968B (en) Weak supervision learning method for associated semantic elements in web texts and images
Cheng et al. A survey and analysis on automatic image annotation
Wang et al. Self-constraining and attention-based hashing network for bit-scalable cross-modal retrieval
CN110059198A (en) A kind of discrete Hash search method across modal data kept based on similitude
Jiao et al. SAR images retrieval based on semantic classification and region-based similarity measure for earth observation
CN112819023B (en) Sample set acquisition method, device, computer equipment and storage medium
CN109284414A (en) The cross-module state content search method and system kept based on semanteme
Qi et al. Two-dimensional multilabel active learning with an efficient online adaptation model for image classification
CN110647904B (en) Cross-modal retrieval method and system based on unmarked data migration
Wang et al. Semantic gap in cbir: Automatic objects spatial relationships semantic extraction and representation
JP5626042B2 (en) Retrieval system, method and program for representative image in image set
Qian et al. Landmark summarization with diverse viewpoints
CN111985520A (en) Multi-mode classification method based on graph convolution neural network
CN114461839B (en) Multi-mode pre-training-based similar picture retrieval method and device and electronic equipment
CN113515669A (en) Data processing method based on artificial intelligence and related equipment
CN115658934A (en) Image-text cross-modal retrieval method based on multi-class attention mechanism
CN112182275A (en) Trademark approximate retrieval system and method based on multi-dimensional feature fusion
CN112906517B (en) Self-supervision power law distribution crowd counting method and device and electronic equipment
CN104346456B (en) The digital picture multi-semantic meaning mask method measured based on spatial dependence
Wang et al. Multimodal Poisson gamma belief network
CN111506832B (en) Heterogeneous object completion method based on block matrix completion
WO2022162427A1 (en) Annotation-efficient image anomaly detection
Chiang Interactive tool for image annotation using a semi-supervised and hierarchical approach
CN115994239A (en) Prototype comparison learning-based semi-supervised remote sensing image retrieval method and system
TW202004519A (en) Method for automatically classifying images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant