CN107562812A - A kind of cross-module state similarity-based learning method based on the modeling of modality-specific semantic space - Google Patents

A kind of cross-module state similarity-based learning method based on the modeling of modality-specific semantic space Download PDF

Info

Publication number
CN107562812A
CN107562812A CN201710684763.6A CN201710684763A CN107562812A CN 107562812 A CN107562812 A CN 107562812A CN 201710684763 A CN201710684763 A CN 201710684763A CN 107562812 A CN107562812 A CN 107562812A
Authority
CN
China
Prior art keywords
mrow
msubsup
msub
text
cross
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710684763.6A
Other languages
Chinese (zh)
Other versions
CN107562812B (en
Inventor
彭宇新
綦金玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN201710684763.6A priority Critical patent/CN107562812B/en
Publication of CN107562812A publication Critical patent/CN107562812A/en
Application granted granted Critical
Publication of CN107562812B publication Critical patent/CN107562812B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of cross-module state similarity-based learning method based on the modeling of modality-specific semantic space, comprise the following steps:1. cross-module state database is established, wherein comprising multiple modalities categorical data, and the data in database are divided into training set, test set and checking and collected.2. for every kind of modality type in cross-module state database, construction, by other modality type data projections to the semantic space, obtains the cross-module state similarity for the modality-specific for the semantic space of the modality-specific.3. the cross-module state similarity for modality-specific obtained from different modal semantic spaces is merged, final cross-module state similarity is obtained.4. any one modality type in test set is taken, using another modality type as target modalities, to calculate inquiry sample as inquiry mode and inquire about the similitude of target, the correlated results list of target modalities data is obtained according to similitude.The present invention can improve the accuracy rate of cross-module state retrieval.

Description

A kind of cross-module state similarity-based learning method based on the modeling of modality-specific semantic space
Technical field
The present invention relates to multimedia retrieval field, and in particular to a kind of cross-module state based on the modeling of modality-specific semantic space Similarity-based learning method.
Background technology
Nowadays, the multi-modal data including image, video, text and audio is widely present on the internet, these Multi-modal data is to aid in the basis of artificial intelligence cognition real world.Some research work are being attempted to break different modalities Isomery wide gap between data, and cross-module state retrieval studying a question as one of focus, it is possible to achieve cross over different moulds The information retrieval of state data, and there is extensive practical application request, such as search engine and digital library etc..Traditional Single mode is retrieved, such as image retrieval, video frequency searching etc., is all confined to the form of single mode, can only be returned and inquire about and be identical The retrieval result of modality type.It is different, the retrieval of cross-module state it is more convenient with it is useful, any modality type can be passed through Query and search obtain the retrieval result of different modalities.
A major challenge of cross-module state retrieval is how to tackle the inconsistency of different modalities, and learns inherent pass therebetween Connection.Because different modalities data have diversified representation and distribution character, and it is empty to be dispersed in respective feature Between, this isomery characteristic make it that the similitude measured between different modalities is very difficult, such as piece image and a section audio it Between similitude.In view of the above-mentioned problems, researcher proposes certain methods, the character representation of different modalities data is projected Same uniform spaces learn Unified Characterization, so as to which the similitude between different modalities data corresponding can be united by calculating its One characterize between distance obtain.Conventional method for different modalities data by learning mapping matrix to maximize pass therebetween Connection, such as different modalities are analyzed by canonical correlation analysis (Canonical Correlation Analysis, abbreviation CCA) Paired incidence relation between data, different modalities data are mapped to the public subspace of same dimension.In addition, Zhai et al. In document " Learning Cross-Media Joint Representation with Sparse and Semi- The method based on figure stipulations is proposed in Supervised Regularization ", is different modalities data configuration graph model, Cross-module state association study is carried out simultaneously and high-level semantic is abstracted.
In recent years, the huge progress that deep learning obtains is promoted researcher and different moulds is modeled using deep neural network Incidence relation, Feng et al. between state data is in document " Cross-modal Retrieval with Correspondence Corresponding self-encoding encoder (Correspondence Autoencoder, abbreviation Corr-AE) is proposed in Autoencoder ", passes through structure The connected network structure of two-way is built, while models the incidence relation and reconstruction information of different modalities data.Peng et al. is in document “Cross-media shared representation by hierarchical learning with multiple Deep networks " propose cross-module state Multi net voting structural model (Cross-media Multiple Deep Network, letter Claim CMDN), it represents that the study stage models between semantic information and different modalities in mode simultaneously in single-mode separation Related information, then in Unified Characterization study stage structure multitiered network structure, fusion single mode semantic abstraction represents and single mode State association represents, and the mode learnt using stacking models reconstruction simultaneously and related information learns to obtain cross-module state Unified Characterization.
But above-mentioned existing method is mostly comparably to throw the data of different modalities by mapping matrix or depth model Uniform spaces are mapped to excavate potential alignment relation therebetween, it means that the information excavated from different modal datas is equivalent 's.But in general, different modalities data, such as image and text, relation therebetween is often unequal and complementary.When When they describe same semantic jointly, the information of inequality may be included, because information exclusive inside some mode is not Content that can be well with the statement of other mode is alignd.Therefore, it is potential to excavate comparably to treat different modalities data Fine granularity alignment content simultaneously builds a uniform spaces, can lose information exclusive and useful in mode, and can not make full use of The abundant internal information that every kind of mode provides.
The content of the invention
In view of the shortcomings of the prior art, the present invention proposes a kind of cross-module state phase based on the modeling of modality-specific semantic space Like sexology learning method, construction is trained circulation notice network to the modality-specific data, built for the semantic space of modality-specific Fine granularity information and spatial context information inside mould mode, study is then associated by the joint based on notice mechanism Other modal datas are projected to the semantic space of the mode, fully learn unbalanced related information between different modalities, most The cross-module state similarity for modality-specific obtained from different modal semantic spaces is carried out using the mode of dynamic fusion afterwards Fusion, the complementarity of different modalities semantic space is further excavated, improve the accuracy rate of cross-module state retrieval.
To achieve the above objectives, the technical solution adopted by the present invention is as follows:
A kind of cross-module state similarity-based learning method based on the modeling of modality-specific semantic space, specific mould is directed to for constructing The semantic space of state, and the cross-module state similarity for modality-specific obtained from different modal semantic spaces is merged, The similarity of different modalities data is obtained, so as to realize that cross-module state is retrieved, is comprised the following steps, wherein step (1)-(3) obtain Cross-module state similarity, step (4) further realize that cross-module state is retrieved:
(1) cross-module state database is established, wherein including the data of multiple modalities type;
(2) the every kind of modality type being directed in cross-module state database, construction is directed to the semantic space of the modality-specific, by it His modality type data projection obtains the cross-module state similarity for the modality-specific to the semantic space;
(3) the cross-module state similarity for modality-specific obtained from the semantic space of different modalities is merged, obtained To final cross-module state similarity;
(4) any one modality type is used, using another modality type as target modalities, will to be looked into as inquiry mode Each data of inquiry mode, which are used as, inquires about sample, the data in searched targets mode, calculates inquiry sample and the phase of inquiry target Like property, the correlated results list of target modalities data is obtained according to similitude.
Further, a kind of above-mentioned cross-module state similarity-based learning method based on the modeling of modality-specific semantic space, the step Suddenly (1) cross-module state database can include multiple modalities type, such as image, text etc..
Further, a kind of above-mentioned cross-module state similarity-based learning method based on the modeling of modality-specific semantic space, the step Suddenly the semantic space building method for modality-specific of (2), the modality-specific data are trained with circulation notice network, then Other modality type data projections are obtained to the semantic space of the mode by the joint association study based on notice mechanism For the cross-module state similarity of the modality-specific.
Further, a kind of above-mentioned cross-module state similarity-based learning method based on the modeling of modality-specific semantic space, the step Suddenly (3) mid-span mode similarity learning method, it is directed to using the mode of dynamic fusion by what is obtained from different modal semantic spaces The cross-module state similarity of modality-specific is merged.
Further, a kind of above-mentioned cross-module state similarity-based learning method based on the modeling of modality-specific semantic space, the step Suddenly the retrieval mode of (4) is that, using a kind of modality type as inquiry mode, another modality type is as target modalities. Each data for inquiring about mode are used as inquiry sample, after similitude is calculated according to step (3), with target modalities All data calculate similitude, are then sorted from big to small according to similitude, obtain correlated results list.
Effect of the invention is that:Compared with the conventional method, this method is directed to the semantic space of modality-specific by constructing, The fine granularity information and spatial context information inside mode can be fully modeled, then passes through the connection based on notice mechanism Association study is closed, fully learns unbalanced related information between different modalities, it is finally further using the mode of dynamic fusion The complementarity of different modalities semantic space is excavated, improves the accuracy rate of cross-module state retrieval.
Why this method has foregoing invention effect, and its reason is:For the semantic space of modality-specific, to the spy Determine modal data training circulation notice network, model the fine granularity information and spatial context information inside mode, then By the joint association study based on notice mechanism by the semantic space of other modality type data projections to the mode, fully Unbalanced related information between study different modalities, will be from different modal semantic spaces finally using the mode of dynamic fusion The obtained cross-module state similarity for modality-specific is merged, and further excavates the complementarity of different modalities semantic space, Improve the accuracy rate of cross-module state retrieval.
Brief description of the drawings
Fig. 1 is a kind of cross-module state similarity-based learning method flow based on the modeling of modality-specific semantic space of the present invention Figure.
Fig. 2 is the schematic diagram of the complete network structure of the present invention.
Embodiment
The present invention is described in further detail with specific embodiment below in conjunction with the accompanying drawings.
A kind of cross-module state similarity-based learning method based on the modeling of modality-specific semantic space of the present invention, its flow is as schemed Shown in 1, comprise the steps of:
(1) cross-module state database is established, wherein including the data of multiple modalities type, and the data in database are divided into Training set, test set and checking collection.
In the present embodiment, the cross-module state database can include multiple modalities type, including image, text.
Cross-module state data set, D={ D are represented with D(i),D(t), wherein
For medium type r, wherein r=i, t (i represents image, and t represents text), n is defined(r)For its data amount check.Instruction Each data that white silk is concentrated have and an only semantic classes.
DefinitionFor the characteristic vector of p-th of data in medium type r, it represents that structure is a d(r)× 1 to Amount, wherein d(r)Presentation medium type r characteristic vector dimension.
DefinitionSemantic label be set toIt represents the vector that structure is c × 1, and wherein c represents semantic classes Total amount.In have and it is only one-dimensional be 1, remaining is 0, represents the semantic classes of the data for mark that value is corresponding to 1 row Label.
(2) the every kind of modality type being directed in cross-module state database, construction is directed to the semantic space of the modality-specific, by it His modality type data projection obtains the cross-module state similarity for the modality-specific to the semantic space.
The process of the step is as shown in Figure 2.In the present embodiment, for image, semantic spatial configuration, circulation notice is used Network model modeled images data, zoom to 256 × 256, and be input in convolutional neural networks by original image first.Then From convolutional neural networks, last pond layer (pooling layer) is the different respective character representation of extracted region of imageAnd the regional in an image is organized into a sequence in order, use LSTM (Long-Short Term Memory, shot and long term memory) spatial context information between neural net model establishing different images region, its sequence exported can To be expressed asTraining pattern is set to focus on prior image-region followed by notice mechanism, specifically Ground, fully-connected network and Softmax active coatings are constructed, passes through equation below computation vision notice weight:
WhereinWithFor the network parameter of each layer, and aiInclude the visual attention weight of different zones in image.Cause This, the characteristic vector in n-th of region can be expressed as in an image(in image, semantic space in Fig. 2It is shown), Local the fine granularity information and spatial context information of image are contained simultaneously.In next step, text data is projected into image Semantic space learns to carry out the association of cross-module state.Specifically, the term vector that k dimensions are first extracted for each word in text data is special Sign, a text then comprising n word can be expressed as n × k matrix, be input to text convolutional neural networks and obtain the sentence The character representation of wordsThen image ipWith text tpCross-module state similarity in image, semantic space is defined as follows (in such as Fig. 2 In image, semantic spaceIt is shown):
WhereinRepresent image ipIn j-th of provincial characteristics vector.Loss function realization is finally defined as follows to be based on The association study of notice:
Two of above-mentioned formula are defined respectively as:
WhereinThe picture/text pair of matching is represented,WithRepresent unmatched picture/text pair, α It is boundary parameter, and N represents the triple number of sampling.So far, can be obtained for image modalities from image, semantic space Cross-module state similarity simi, expression study and measuring similarity learning process are incorporated, while fully modeled inside image Unbalanced related information between fine granularity information and different modalities.
It is first using circulation notice network model modeling text data for text semantic spatial configuration in the present embodiment For each text data, the term vector feature of k dimensions is extracted for wherein each word, then text comprising n word can be with N × k matrix is expressed as, is input to text convolutional neural networks, and from last pond layer (pooling layer) of network Extract the character representation of different text blocks.Then it is successively inputted in LSTM neutral nets, to model the context of text letter Breath, its sequence exported can be expressed asTraining pattern is focused on followed by notice mechanism heavier The text fragments wanted, specifically, construct fully-connected network and Softmax active coatings, and text notice is calculated by equation below Weight:
WhereinWithFor the network parameter of each layer, and atInclude the text notice weight of different fragments in text.Cause This, the characteristic vector of m-th of fragment can be expressed as in a text(in Fig. 2 Chinese version semantic spacesInstitute Show), while contain local the fine granularity information and spatial context information of text.In next step, view data is projected Text semantic space learns to carry out the association of cross-module state.Specifically, first the overall feature of image is extracted using convolutional neural networks RepresentThen image ipWith text tpCross-module state similarity in text semantic space is defined as follows (such as Fig. 2 Chinese versions semanteme In spaceIt is shown):
WhereinRepresent text tpIn j-th of segment characterizations vector.Loss function realization is finally defined as follows to be based on The association study of notice:
Two of above-mentioned formula are defined respectively as:
WhereinThe picture/text pair of matching is represented,WithRepresent unmatched picture/text pair, β It is boundary parameter, and M represents the triple number of sampling.So far, can obtain being directed to text modality from text semantic space Cross-module state similarity simt, expression study and measuring similarity learning process are incorporated, while fully modeled inside text Fine granularity information and different modalities between unbalanced related information.
(3) the cross-module state similarity for modality-specific obtained from different modal semantic spaces is merged, obtained Final cross-module state similarity.
In the present embodiment, using dynamic fusion mode by from different modal semantic spaces obtain for modality-specific Cross-module state similarity is merged.First, it is the cross-module state for modality-specific obtained from different modal semantic spaces is similar Degree is normalized between 0 to 1 according to formula below:
Then, for picture/text to (ip,tp), obtain branch's conduct after normalization is calculated from image, semantic space The picture/text is to the changeable weight in text space, and the branch that obtains after normalization is calculated from text semantic space makees It is the picture/text to the changeable weight in image space.Therefore, final cross-module state similarity is defined as follows:
Sim(ip,tp)=rt(ip,tp)·simi(ip,tp)+ri(ip,tp)·simt(ip,tp)
The complementarity of different modalities semantic space can be fully excavated, and further lifts the effect of cross-module state retrieval.
(4) any one modality type in test set is used to be used as mesh using another modality type as inquiry mode Mark mode.Using each data for inquiring about mode as inquiring about sample, the data in searched targets mode, according in step (3) Mode, calculate inquiry sample and inquire about the similitude of target, by similitude according to sorting from big to small, obtain target modalities data Correlated results list.
It is following test result indicates that, compared with the conventional method, cross-module state of the present invention based on more granularity hierarchical networks is closed Join learning method, higher retrieval rate can be obtained.
The present embodiment employs Wikipedia cross-module state data sets and tested, and the data set is by document " A New Approach to Cross-Modal Multimedia Retrieval " (author N.Rasiwasia, J.Pereira, E.Coviello, G.Doyle, G.Lanckriet, R.Levy and N.Vasconcelos, it is published in the ACM of 2010 years International conference on Multimedia) propose, including 2866 sections of texts and 2866 images, and Text and image are one-to-one, are always divided into 10 classifications, wherein 2173 sections of texts and 2173 images are as training set, 231 sections of texts and 231 images collect as checking, and 492 sections of texts and 492 images are as test set.Test following 3 kinds of sides Method is as Experimental comparison:
Existing method one:Document " Learning Cross-Media Joint Representation with Sparse Joint in and Semi-Supervised Regularization " (author X.Zhai, Y.Peng, and J.Xiao) represents Learn (Joint Representation Learning, abbreviation JRL) method, build graph model for different modalities data, simultaneously Carry out cross-module state association study and high-level semantic is abstracted, and introduce sparse and semi-supervised stipulations.
Existing method two:Document " Cross-modal Retrieval with Correspondence Corresponding self-encoding encoder network (Correspondence in Autoencoder " (author F.Feng, X.Wang, and R.Li) Autoencoder, abbreviation Corr-AE) method, two road networks are constructed, and be connected to model related information simultaneously in intermediate layer With reconstruction information.
Existing method three:Document " Cross-media shared representation by hierarchical Cross-module state in learning with multiple deep networks " (author Y.Peng, X.Huang, and J.Qi) Multi net voting structure (Cross-media Multiple Deep Network, abbreviation CMDN), study rank is represented in single-mode separation Section models the related information between semantic information and different modalities in mode simultaneously, then learns stage structure in Unified Characterization Build multitiered network structure, and using the mode of stacking study model reconstruction simultaneously and related information learns to obtain cross-module state and unifies table Sign.
The present invention:The method of the present embodiment.
Experiment evaluates and tests cross-module state using conventional MAP (the mean average precision) indexs of information retrieval field The accuracy of retrieval, MAP refer to the average value of each inquiry sample retrieval accuracy, and MAP value is bigger, illustrates the retrieval of cross-module state As a result it is better.
The Experimental results show of the present invention of table 1.
Image querying text Text query image It is average
Existing method one 0.479 0.428 0.454
Existing method two 0.442 0.429 0.436
Existing method three 0.487 0.427 0.457
The present invention 0.516 0.458 0.487
As it can be seen from table 1 the present invention compares existing method in two image querying text, text query image tasks Achieve larger raising.Existing method one builds graph model under conventional frame and different modalities data is linearly mapped to unified sky Between, it is difficult to the fully complicated cross-module state incidence relation of modeling.Existing method two and existing method three use depth network structure, But the data of different modalities are comparably projected into uniform spaces to excavate potential alignment association therebetween by depth model, Information exclusive and useful in mode can be lost, and the internal information that every kind of mode can not be made full use of to provide.A side of the invention Surface construction is directed to the semantic space of modality-specific, models fine granularity information and spatial context information inside mode, simultaneously Fully unbalanced related information between study different modalities.On the other hand, will be from different modalities using the mode of dynamic fusion The cross-module state similarity for modality-specific that semantic space obtains is merged, and further excavates different modalities semantic space Complementarity, so as to improve the accuracy rate of cross-module state retrieval.
In other embodiments, the method for the construction modality-specific semantic space in step (2) of the present invention, uses LSTM The contextual information of (Long-Short Term Memory, shot and long term memory) neural net model establishing image and text data, together Sample can use Recognition with Recurrent Neural Network (Recurrent Neural Network, abbreviation RNN) as replacement.
Obviously, those skilled in the art can carry out the essence of various changes and modification without departing from the present invention to the present invention God and scope.So, if these modifications and variations of the present invention belong to the scope of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to comprising including these changes and modification.

Claims (8)

1. a kind of cross-module state similarity-based learning method based on the modeling of modality-specific semantic space, comprises the following steps:
(1) cross-module state database is established, wherein including the data of multiple modalities type;
(2) the every kind of modality type being directed in cross-module state database, construction is directed to the semantic space of the modality-specific, by other moulds State categorical data projects the semantic space, obtains the cross-module state similarity for the modality-specific;
(3) the cross-module state similarity for modality-specific obtained from the semantic space of different modalities is merged, obtained most Whole cross-module state similarity.
2. the method as described in claim 1, it is characterised in that the cross-module state database includes multiple modalities type, described Multiple modalities type includes image, text.
3. the method as described in claim 1, it is characterised in that the semantic space for modality-specific in step (2) constructs Method is:To the data training circulation notice network of the modality-specific, then associated by the joint based on notice mechanism By the semantic space of the data projection of other modality types to the mode, the cross-module state for obtaining being directed to the modality-specific is similar for study Degree.
4. method as claimed in claim 3, it is characterised in that the building method in image, semantic space is:
A) by original image and it is input in convolutional neural networks;
B) from convolutional neural networks, last pond layer is the different respective character representation of extracted region of image And the regional in an image is organized into a sequence in order, built using LSTM neutral nets or RNN neutral nets Spatial context information between mould different images region, its sequence exported are expressed as
C) training pattern is focused on important image-region using notice mechanism, first construct fully-connected network and Softmax active coatings, then pass through equation below computation vision notice weight:
<mrow> <msup> <mi>M</mi> <mi>i</mi> </msup> <mo>=</mo> <mi>tanh</mi> <mrow> <mo>(</mo> <msubsup> <mi>W</mi> <mi>a</mi> <mi>i</mi> </msubsup> <msub> <mi>H</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>,</mo> </mrow>
<mrow> <msup> <mi>a</mi> <mi>i</mi> </msup> <mo>=</mo> <mi>s</mi> <mi>o</mi> <mi>f</mi> <mi>t</mi> <mi>m</mi> <mi>a</mi> <mi>x</mi> <mrow> <mo>(</mo> <msubsup> <mi>w</mi> <mrow> <mi>i</mi> <mi>a</mi> </mrow> <mi>T</mi> </msubsup> <msub> <mi>M</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>,</mo> </mrow>
WhereinWithFor the network parameter of each layer, and aiInclude the visual attention weight of different zones in image, therefore, one The characteristic vector in n-th of region is expressed as in individual imageLocal fine granularity information and the space of image are contained simultaneously Contextual information;
D) text data is projected into image, semantic space to carry out the association study of cross-module state, first to be each in text data The term vector feature of word extraction k dimensions, the matrix that a text representation then comprising n word is n × k, is input to text convolution Neutral net obtains the character representation of wordThen image i is definedpWith text tpCross-module state phase in image, semantic space It is as follows like spending:
<mrow> <msub> <mi>sim</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>i</mi> <mi>p</mi> </msub> <mo>,</mo> <msub> <mi>t</mi> <mi>p</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <msubsup> <mi>a</mi> <mi>j</mi> <msub> <mi>i</mi> <mi>p</mi> </msub> </msubsup> <msubsup> <mi>h</mi> <mi>j</mi> <msub> <mi>i</mi> <mi>p</mi> </msub> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>q</mi> <mi>p</mi> <mi>t</mi> </msubsup> <mo>,</mo> </mrow>
WhereinRepresent image ipIn j-th of provincial characteristics vector;
E) it is defined as follows loss function and realizes the association study based on notice:
<mrow> <msub> <mi>L</mi> <mi>i</mi> </msub> <mo>=</mo> <munderover> <mi>&amp;Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msub> <mi>l</mi> <mrow> <mi>i</mi> <mn>1</mn> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>i</mi> <mi>n</mi> <mo>+</mo> </msubsup> <mo>,</mo> <msubsup> <mi>t</mi> <mi>n</mi> <mo>+</mo> </msubsup> <mo>,</mo> <msubsup> <mi>t</mi> <mi>n</mi> <mo>-</mo> </msubsup> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>l</mi> <mrow> <mi>i</mi> <mn>2</mn> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>t</mi> <mi>n</mi> <mo>+</mo> </msubsup> <mo>,</mo> <msubsup> <mi>i</mi> <mi>n</mi> <mo>+</mo> </msubsup> <mo>,</mo> <msubsup> <mi>i</mi> <mi>n</mi> <mo>-</mo> </msubsup> <mo>)</mo> </mrow> <mo>,</mo> </mrow>
Two in above-mentioned formula are defined respectively as:
<mrow> <msub> <mi>l</mi> <mrow> <mi>i</mi> <mn>1</mn> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>i</mi> <mi>n</mi> <mo>+</mo> </msubsup> <mo>,</mo> <msubsup> <mi>t</mi> <mi>n</mi> <mo>+</mo> </msubsup> <mo>,</mo> <msubsup> <mi>t</mi> <mi>n</mi> <mo>-</mo> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mi>m</mi> <mi>a</mi> <mi>x</mi> <mrow> <mo>(</mo> <mn>0</mn> <mo>,</mo> <mi>&amp;alpha;</mi> <mo>+</mo> <msub> <mi>sim</mi> <mi>i</mi> </msub> <mo>(</mo> <mrow> <msubsup> <mi>i</mi> <mi>n</mi> <mo>+</mo> </msubsup> <mo>,</mo> <msubsup> <mi>t</mi> <mi>n</mi> <mo>+</mo> </msubsup> </mrow> <mo>)</mo> <mo>-</mo> <msub> <mi>sim</mi> <mi>i</mi> </msub> <mo>(</mo> <mrow> <msubsup> <mi>i</mi> <mi>n</mi> <mo>+</mo> </msubsup> <mo>,</mo> <msubsup> <mi>t</mi> <mi>n</mi> <mo>-</mo> </msubsup> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mo>,</mo> </mrow>
<mrow> <msub> <mi>l</mi> <mrow> <mi>i</mi> <mn>2</mn> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>t</mi> <mi>n</mi> <mo>+</mo> </msubsup> <mo>,</mo> <msubsup> <mi>i</mi> <mi>n</mi> <mo>+</mo> </msubsup> <mo>,</mo> <msubsup> <mi>i</mi> <mi>n</mi> <mo>-</mo> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mi>m</mi> <mi>a</mi> <mi>x</mi> <mrow> <mo>(</mo> <mn>0</mn> <mo>,</mo> <mi>&amp;alpha;</mi> <mo>+</mo> <msub> <mi>sim</mi> <mi>i</mi> </msub> <mo>(</mo> <mrow> <msubsup> <mi>i</mi> <mi>n</mi> <mo>+</mo> </msubsup> <mo>,</mo> <msubsup> <mi>t</mi> <mi>n</mi> <mo>+</mo> </msubsup> </mrow> <mo>)</mo> <mo>-</mo> <msub> <mi>sim</mi> <mi>i</mi> </msub> <mo>(</mo> <mrow> <msubsup> <mi>i</mi> <mi>n</mi> <mo>-</mo> </msubsup> <mo>,</mo> <msubsup> <mi>t</mi> <mi>n</mi> <mo>+</mo> </msubsup> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mo>,</mo> </mrow>
WhereinThe picture/text pair of matching is represented,WithUnmatched picture/text pair is represented, α is side Boundary's parameter, and N represents the triple number of sampling.
5. method as claimed in claim 4, it is characterised in that the building method in text semantic space is:
A) for each text data, the term vector feature of k dimensions is extracted for wherein each word, then a text for including n word Originally n × k matrix is expressed as, is input to text convolutional neural networks;
B) from the character representation of last different text block of pond layer extraction of convolutional neural networks, then it is successively inputted to In LSTM neutral nets or RNN neutral nets, to model the contextual information of text, its sequence exported is expressed as
C) training pattern is focused on important text fragments using notice mechanism, first construct fully-connected network and Softmax active coatings, text notice weight is then calculated by equation below:
<mrow> <msup> <mi>M</mi> <mi>t</mi> </msup> <mo>=</mo> <mi>tanh</mi> <mrow> <mo>(</mo> <msubsup> <mi>W</mi> <mi>a</mi> <mi>t</mi> </msubsup> <msub> <mi>H</mi> <mi>t</mi> </msub> <mo>)</mo> </mrow> <mo>,</mo> </mrow>
<mrow> <msup> <mi>a</mi> <mi>t</mi> </msup> <mo>=</mo> <mi>s</mi> <mi>o</mi> <mi>f</mi> <mi>t</mi> <mi>m</mi> <mi>a</mi> <mi>x</mi> <mrow> <mo>(</mo> <msubsup> <mi>w</mi> <mrow> <mi>t</mi> <mi>a</mi> </mrow> <mi>T</mi> </msubsup> <msub> <mi>M</mi> <mi>t</mi> </msub> <mo>)</mo> </mrow> <mo>,</mo> </mrow>
WhereinWithFor the network parameter of each layer, and atInclude the text notice weight of different fragments in text, therefore one The characteristic vector of m-th of fragment is expressed as in individual textLocal fine granularity information and the space of text are contained simultaneously Contextual information;
D) view data is projected into text semantic space to carry out the association study of cross-module state, carried first by convolutional neural networks Take the overall character representation of imageThen image i is definedpWith text tpText semantic space cross-module state similarity such as Under:
<mrow> <msub> <mi>sim</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>i</mi> <mi>p</mi> </msub> <mo>,</mo> <msub> <mi>t</mi> <mi>p</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <msubsup> <mi>a</mi> <mi>j</mi> <msub> <mi>t</mi> <mi>p</mi> </msub> </msubsup> <msubsup> <mi>h</mi> <mi>j</mi> <msub> <mi>t</mi> <mi>p</mi> </msub> </msubsup> <mo>&amp;CenterDot;</mo> <msubsup> <mi>q</mi> <mi>p</mi> <mi>i</mi> </msubsup> <mo>,</mo> </mrow>
WhereinRepresent text tpIn j-th of segment characterizations vector;
E) it is defined as follows loss function and realizes the association study based on notice:
<mrow> <msub> <mi>L</mi> <mi>t</mi> </msub> <mo>=</mo> <munderover> <mi>&amp;Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <msub> <mi>l</mi> <mrow> <mi>t</mi> <mn>1</mn> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>t</mi> <mi>n</mi> <mo>+</mo> </msubsup> <mo>,</mo> <msubsup> <mi>i</mi> <mi>n</mi> <mo>+</mo> </msubsup> <mo>,</mo> <msubsup> <mi>i</mi> <mi>n</mi> <mo>-</mo> </msubsup> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>l</mi> <mrow> <mi>t</mi> <mn>2</mn> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>i</mi> <mi>n</mi> <mo>+</mo> </msubsup> <mo>,</mo> <msubsup> <mi>t</mi> <mi>n</mi> <mo>+</mo> </msubsup> <mo>,</mo> <msubsup> <mi>t</mi> <mi>n</mi> <mo>-</mo> </msubsup> <mo>)</mo> </mrow> <mo>,</mo> </mrow>
Two in above-mentioned formula are defined respectively as:
<mrow> <msub> <mi>l</mi> <mrow> <mi>t</mi> <mn>1</mn> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>t</mi> <mi>n</mi> <mo>+</mo> </msubsup> <mo>,</mo> <msubsup> <mi>i</mi> <mi>n</mi> <mo>+</mo> </msubsup> <mo>,</mo> <msubsup> <mi>i</mi> <mi>n</mi> <mo>-</mo> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mi>m</mi> <mi>a</mi> <mi>x</mi> <mrow> <mo>(</mo> <mn>0</mn> <mo>,</mo> <mi>&amp;beta;</mi> <mo>+</mo> <msub> <mi>sim</mi> <mi>t</mi> </msub> <mo>(</mo> <mrow> <msubsup> <mi>i</mi> <mi>n</mi> <mo>+</mo> </msubsup> <mo>,</mo> <msubsup> <mi>t</mi> <mi>n</mi> <mo>+</mo> </msubsup> </mrow> <mo>)</mo> <mo>-</mo> <msub> <mi>sim</mi> <mi>t</mi> </msub> <mo>(</mo> <mrow> <msubsup> <mi>i</mi> <mi>n</mi> <mo>-</mo> </msubsup> <mo>,</mo> <msubsup> <mi>t</mi> <mi>n</mi> <mo>+</mo> </msubsup> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mo>,</mo> </mrow>
<mrow> <msub> <mi>l</mi> <mrow> <mi>t</mi> <mn>2</mn> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>i</mi> <mi>n</mi> <mo>+</mo> </msubsup> <mo>,</mo> <msubsup> <mi>t</mi> <mi>n</mi> <mo>+</mo> </msubsup> <mo>,</mo> <msubsup> <mi>t</mi> <mi>n</mi> <mo>-</mo> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mi>m</mi> <mi>a</mi> <mi>x</mi> <mrow> <mo>(</mo> <mn>0</mn> <mo>,</mo> <mi>&amp;beta;</mi> <mo>+</mo> <msub> <mi>sim</mi> <mi>t</mi> </msub> <mo>(</mo> <mrow> <msubsup> <mi>i</mi> <mi>n</mi> <mo>+</mo> </msubsup> <mo>,</mo> <msubsup> <mi>t</mi> <mi>n</mi> <mo>+</mo> </msubsup> </mrow> <mo>)</mo> <mo>-</mo> <msub> <mi>sim</mi> <mi>t</mi> </msub> <mo>(</mo> <mrow> <msubsup> <mi>i</mi> <mi>n</mi> <mo>+</mo> </msubsup> <mo>,</mo> <msubsup> <mi>t</mi> <mi>n</mi> <mo>-</mo> </msubsup> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mo>,</mo> </mrow>
WhereinThe picture/text pair of matching is represented,WithUnmatched picture/text pair is represented, β is side Boundary's parameter, and M represents the triple number of sampling.
6. the method as described in claim 1, it is characterised in that step (3) will be from different modalities using the mode of dynamic fusion The cross-module state similarity for modality-specific that semantic space obtains is merged, and is comprised the following steps:First, will be from different moulds The cross-module state similarity for modality-specific that state semantic space obtains is normalized between 0 to 1 according to formula below:
<mrow> <msub> <mi>r</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>i</mi> <mi>p</mi> </msub> <mo>,</mo> <msub> <mi>t</mi> <mi>p</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>sim</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>i</mi> <mi>p</mi> </msub> <mo>,</mo> <msub> <mi>t</mi> <mi>p</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mi>m</mi> <mi>i</mi> <mi>n</mi> <mrow> <mo>(</mo> <msub> <mi>sim</mi> <mi>i</mi> </msub> <mo>(</mo> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> </mrow> <mo>)</mo> <mo>)</mo> </mrow> </mrow> <mrow> <mi>max</mi> <mrow> <mo>(</mo> <msub> <mi>sim</mi> <mi>i</mi> </msub> <mo>(</mo> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mo>-</mo> <mi>m</mi> <mi>i</mi> <mi>n</mi> <mrow> <mo>(</mo> <msub> <mi>sim</mi> <mi>i</mi> </msub> <mo>(</mo> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> </mrow> <mo>)</mo> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>,</mo> </mrow>
<mrow> <msub> <mi>r</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mrow> <msub> <mi>i</mi> <mi>p</mi> </msub> <mo>,</mo> <msub> <mi>t</mi> <mi>p</mi> </msub> </mrow> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>sim</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mrow> <msub> <mi>i</mi> <mi>p</mi> </msub> <mo>,</mo> <msub> <mi>t</mi> <mi>p</mi> </msub> </mrow> <mo>)</mo> </mrow> <mo>-</mo> <mi>min</mi> <mrow> <mo>(</mo> <mrow> <msub> <mi>sim</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> </mrow> <mo>)</mo> </mrow> </mrow> <mo>)</mo> </mrow> </mrow> <mrow> <mi>max</mi> <mrow> <mo>(</mo> <mrow> <msub> <mi>sim</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> </mrow> <mo>)</mo> </mrow> </mrow> <mo>)</mo> </mrow> <mo>-</mo> <mi>min</mi> <mrow> <mo>(</mo> <mrow> <msub> <mi>sim</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> </mrow> <mo>)</mo> </mrow> </mrow> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>,</mo> </mrow>
Then, for picture/text to (ip,tp), the score after normalization is calculated as the figure from image, semantic space Picture/text is to the changeable weight in text space, and the branch that obtains after normalization is calculated from text semantic space is used as this Picture/text is to the changeable weight in image space;Final cross-module state similarity is defined as follows:
Sim(ip,tp)=rt(ip,tp)·simi(ip,tp)+ri(ip,tp)·simt(ip,tp)。
7. a kind of cross-module state search method, comprises the following steps:
1) any claim methods described in claim 1 to 6 is used to calculate cross-module state similarity;
2) a kind of modality type is used, using another modality type as target modalities, mode will to be inquired about as inquiry mode Each data as the data in inquiry sample searched targets mode, inquire about sample and inquire about the similitude of target by calculating, according to Similitude obtains the retrieval result of target modalities data.
8. method as claimed in claim 7, it is characterised in that step 2) is calculated inquiry sample and inquires about the similar of target After property, sorted from big to small according to similitude, obtain correlated results list.
CN201710684763.6A 2017-08-11 2017-08-11 Cross-modal similarity learning method based on specific modal semantic space modeling Active CN107562812B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710684763.6A CN107562812B (en) 2017-08-11 2017-08-11 Cross-modal similarity learning method based on specific modal semantic space modeling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710684763.6A CN107562812B (en) 2017-08-11 2017-08-11 Cross-modal similarity learning method based on specific modal semantic space modeling

Publications (2)

Publication Number Publication Date
CN107562812A true CN107562812A (en) 2018-01-09
CN107562812B CN107562812B (en) 2021-01-15

Family

ID=60975314

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710684763.6A Active CN107562812B (en) 2017-08-11 2017-08-11 Cross-modal similarity learning method based on specific modal semantic space modeling

Country Status (1)

Country Link
CN (1) CN107562812B (en)

Cited By (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108256631A (en) * 2018-01-26 2018-07-06 深圳市唯特视科技有限公司 A kind of user behavior commending system based on attention model
CN108415819A (en) * 2018-03-15 2018-08-17 中国人民解放军国防科技大学 Hard disk fault tracking method and device
CN108829719A (en) * 2018-05-07 2018-11-16 中国科学院合肥物质科学研究院 The non-true class quiz answers selection method of one kind and system
CN108881950A (en) * 2018-05-30 2018-11-23 北京奇艺世纪科技有限公司 A kind of method and apparatus of video processing
CN109255047A (en) * 2018-07-18 2019-01-22 西安电子科技大学 Based on the complementary semantic mutual search method of image-text being aligned and symmetrically retrieve
CN109325240A (en) * 2018-12-03 2019-02-12 ***通信集团福建有限公司 Method, apparatus, equipment and the medium of index inquiry
CN109508400A (en) * 2018-10-09 2019-03-22 中国科学院自动化研究所 Picture and text abstraction generating method
CN109543714A (en) * 2018-10-16 2019-03-29 北京达佳互联信息技术有限公司 Acquisition methods, device, electronic equipment and the storage medium of data characteristics
CN109543009A (en) * 2018-10-17 2019-03-29 龙马智芯(珠海横琴)科技有限公司 Text similarity assessment system and text similarity appraisal procedure
CN109670071A (en) * 2018-10-22 2019-04-23 北京大学 A kind of across the media Hash search methods and system of the guidance of serializing multiple features
CN109785409A (en) * 2018-12-29 2019-05-21 武汉大学 A kind of image based on attention mechanism-text data fusion method and system
CN109886326A (en) * 2019-01-31 2019-06-14 深圳市商汤科技有限公司 A kind of cross-module state information retrieval method, device and storage medium
CN109902710A (en) * 2019-01-07 2019-06-18 南京热信软件科技有限公司 A kind of fast matching method and device of text image
CN110210540A (en) * 2019-05-22 2019-09-06 山东大学 Across social media method for identifying ID and system based on attention mechanism
CN110580489A (en) * 2018-06-11 2019-12-17 阿里巴巴集团控股有限公司 Data object classification system, method and equipment
WO2020001048A1 (en) * 2018-06-29 2020-01-02 北京大学深圳研究生院 Double semantic space-based adversarial cross-media retrieval method
CN110706771A (en) * 2019-10-10 2020-01-17 复旦大学附属中山医院 Method and device for generating multi-mode education content, server and storage medium
CN110851641A (en) * 2018-08-01 2020-02-28 杭州海康威视数字技术股份有限公司 Cross-modal retrieval method and device and readable storage medium
WO2020063524A1 (en) * 2018-09-30 2020-04-02 北京国双科技有限公司 Method and system for determining legal instrument
CN110990597A (en) * 2019-12-19 2020-04-10 中国电子科技集团公司信息科学研究院 Cross-modal data retrieval system based on text semantic mapping and retrieval method thereof
CN111026894A (en) * 2019-12-12 2020-04-17 清华大学 Cross-modal image text retrieval method based on credibility self-adaptive matching network
JP2020064568A (en) * 2018-10-19 2020-04-23 株式会社日立製作所 Video analysis system, learning device, and method thereof
CN111091010A (en) * 2019-11-22 2020-05-01 京东方科技集团股份有限公司 Similarity determination method, similarity determination device, network training device, network searching device and storage medium
CN111159472A (en) * 2018-11-08 2020-05-15 微软技术许可有限责任公司 Multi-modal chat techniques
CN111199750A (en) * 2019-12-18 2020-05-26 北京葡萄智学科技有限公司 Pronunciation evaluation method and device, electronic equipment and storage medium
CN111274445A (en) * 2020-01-20 2020-06-12 山东建筑大学 Similar video content retrieval method and system based on triple deep learning
CN111339256A (en) * 2020-02-28 2020-06-26 支付宝(杭州)信息技术有限公司 Method and device for text processing
CN111429913A (en) * 2020-03-26 2020-07-17 厦门快商通科技股份有限公司 Digit string voice recognition method, identity verification device and computer readable storage medium
CN111428072A (en) * 2020-03-31 2020-07-17 南方科技大学 Ophthalmologic multimodal image retrieval method, apparatus, server and storage medium
WO2020155418A1 (en) * 2019-01-31 2020-08-06 深圳市商汤科技有限公司 Cross-modal information retrieval method and device, and storage medium
CN111639240A (en) * 2020-05-14 2020-09-08 山东大学 Cross-modal Hash retrieval method and system based on attention awareness mechanism
CN111930992A (en) * 2020-08-14 2020-11-13 腾讯科技(深圳)有限公司 Neural network training method and device and electronic equipment
CN112001279A (en) * 2020-08-12 2020-11-27 山东省人工智能研究院 Cross-modal pedestrian re-identification method based on dual attribute information
CN112041856A (en) * 2018-03-01 2020-12-04 皇家飞利浦有限公司 Cross-modal neural network for prediction
CN112581387A (en) * 2020-12-03 2021-03-30 广州电力通信网络有限公司 Intelligent operation and maintenance system, device and method for power distribution room
CN112668671A (en) * 2021-03-15 2021-04-16 北京百度网讯科技有限公司 Method and device for acquiring pre-training model
CN113094550A (en) * 2020-01-08 2021-07-09 百度在线网络技术(北京)有限公司 Video retrieval method, device, equipment and medium
CN113159371A (en) * 2021-01-27 2021-07-23 南京航空航天大学 Unknown target feature modeling and demand prediction method based on cross-modal data fusion
CN113204666A (en) * 2021-05-26 2021-08-03 杭州联汇科技股份有限公司 Method for searching matched pictures based on characters
CN113392196A (en) * 2021-06-04 2021-09-14 北京师范大学 Topic retrieval method and system based on multi-mode cross comparison
CN113435206A (en) * 2021-05-26 2021-09-24 卓尔智联(武汉)研究院有限公司 Image-text retrieval method and device and electronic equipment
CN113434716A (en) * 2021-07-02 2021-09-24 泰康保险集团股份有限公司 Cross-modal information retrieval method and device
CN113934887A (en) * 2021-12-20 2022-01-14 成都考拉悠然科技有限公司 No-proposal time sequence language positioning method based on semantic decoupling
CN113971209A (en) * 2021-12-22 2022-01-25 松立控股集团股份有限公司 Non-supervision cross-modal retrieval method based on attention mechanism enhancement
CN114417878A (en) * 2021-12-29 2022-04-29 北京百度网讯科技有限公司 Semantic recognition method and device, electronic equipment and storage medium
CN114529757A (en) * 2022-01-21 2022-05-24 四川大学 Cross-modal single-sample three-dimensional point cloud segmentation method
CN115858839A (en) * 2023-02-16 2023-03-28 上海蜜度信息技术有限公司 Cross-modal LOGO retrieval method, system, terminal and storage medium
CN116484878A (en) * 2023-06-21 2023-07-25 国网智能电网研究院有限公司 Semantic association method, device, equipment and storage medium of power heterogeneous data
CN116522168A (en) * 2023-07-04 2023-08-01 北京墨丘科技有限公司 Cross-modal text similarity comparison method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559191A (en) * 2013-09-10 2014-02-05 浙江大学 Cross-media sorting method based on hidden space learning and two-way sorting learning
US9280562B1 (en) * 2006-01-31 2016-03-08 The Research Foundation For The State University Of New York System and method for multimedia ranking and multi-modal image retrieval using probabilistic semantic models and expectation-maximization (EM) learning
CN105718532A (en) * 2016-01-15 2016-06-29 北京大学 Cross-media sequencing method based on multi-depth network structure
CN106095829A (en) * 2016-06-01 2016-11-09 华侨大学 Cross-media retrieval method based on degree of depth study with the study of concordance expression of space

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9280562B1 (en) * 2006-01-31 2016-03-08 The Research Foundation For The State University Of New York System and method for multimedia ranking and multi-modal image retrieval using probabilistic semantic models and expectation-maximization (EM) learning
CN103559191A (en) * 2013-09-10 2014-02-05 浙江大学 Cross-media sorting method based on hidden space learning and two-way sorting learning
CN105718532A (en) * 2016-01-15 2016-06-29 北京大学 Cross-media sequencing method based on multi-depth network structure
CN106095829A (en) * 2016-06-01 2016-11-09 华侨大学 Cross-media retrieval method based on degree of depth study with the study of concordance expression of space

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李志欣 等: "基于语义学习的图像多模态检索", 《计算机工程》 *

Cited By (78)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108256631A (en) * 2018-01-26 2018-07-06 深圳市唯特视科技有限公司 A kind of user behavior commending system based on attention model
CN112041856A (en) * 2018-03-01 2020-12-04 皇家飞利浦有限公司 Cross-modal neural network for prediction
CN108415819A (en) * 2018-03-15 2018-08-17 中国人民解放军国防科技大学 Hard disk fault tracking method and device
CN108415819B (en) * 2018-03-15 2021-05-25 中国人民解放军国防科技大学 Hard disk fault tracking method and device
CN108829719A (en) * 2018-05-07 2018-11-16 中国科学院合肥物质科学研究院 The non-true class quiz answers selection method of one kind and system
CN108881950A (en) * 2018-05-30 2018-11-23 北京奇艺世纪科技有限公司 A kind of method and apparatus of video processing
CN110580489A (en) * 2018-06-11 2019-12-17 阿里巴巴集团控股有限公司 Data object classification system, method and equipment
WO2020001048A1 (en) * 2018-06-29 2020-01-02 北京大学深圳研究生院 Double semantic space-based adversarial cross-media retrieval method
CN109255047A (en) * 2018-07-18 2019-01-22 西安电子科技大学 Based on the complementary semantic mutual search method of image-text being aligned and symmetrically retrieve
CN110851641A (en) * 2018-08-01 2020-02-28 杭州海康威视数字技术股份有限公司 Cross-modal retrieval method and device and readable storage medium
WO2020063524A1 (en) * 2018-09-30 2020-04-02 北京国双科技有限公司 Method and system for determining legal instrument
CN109508400A (en) * 2018-10-09 2019-03-22 中国科学院自动化研究所 Picture and text abstraction generating method
CN109543714A (en) * 2018-10-16 2019-03-29 北京达佳互联信息技术有限公司 Acquisition methods, device, electronic equipment and the storage medium of data characteristics
CN109543009B (en) * 2018-10-17 2019-10-25 龙马智芯(珠海横琴)科技有限公司 Text similarity assessment system and text similarity appraisal procedure
CN109543009A (en) * 2018-10-17 2019-03-29 龙马智芯(珠海横琴)科技有限公司 Text similarity assessment system and text similarity appraisal procedure
JP7171361B2 (en) 2018-10-19 2022-11-15 株式会社日立製作所 Data analysis system, learning device, and method thereof
JP2020064568A (en) * 2018-10-19 2020-04-23 株式会社日立製作所 Video analysis system, learning device, and method thereof
CN109670071B (en) * 2018-10-22 2021-10-08 北京大学 Serialized multi-feature guided cross-media Hash retrieval method and system
CN109670071A (en) * 2018-10-22 2019-04-23 北京大学 A kind of across the media Hash search methods and system of the guidance of serializing multiple features
US11921782B2 (en) 2018-11-08 2024-03-05 Microsoft Technology Licensing, Llc VideoChat
CN111159472B (en) * 2018-11-08 2024-03-12 微软技术许可有限责任公司 Multimodal chat technique
CN111159472A (en) * 2018-11-08 2020-05-15 微软技术许可有限责任公司 Multi-modal chat techniques
CN109325240A (en) * 2018-12-03 2019-02-12 ***通信集团福建有限公司 Method, apparatus, equipment and the medium of index inquiry
CN109785409A (en) * 2018-12-29 2019-05-21 武汉大学 A kind of image based on attention mechanism-text data fusion method and system
CN109785409B (en) * 2018-12-29 2020-09-08 武汉大学 Image-text data fusion method and system based on attention mechanism
CN109902710A (en) * 2019-01-07 2019-06-18 南京热信软件科技有限公司 A kind of fast matching method and device of text image
CN109902710B (en) * 2019-01-07 2023-07-11 李晓妮 Quick matching method and device for text images
WO2020155423A1 (en) * 2019-01-31 2020-08-06 深圳市商汤科技有限公司 Cross-modal information retrieval method and apparatus, and storage medium
CN109886326A (en) * 2019-01-31 2019-06-14 深圳市商汤科技有限公司 A kind of cross-module state information retrieval method, device and storage medium
WO2020155418A1 (en) * 2019-01-31 2020-08-06 深圳市商汤科技有限公司 Cross-modal information retrieval method and device, and storage medium
CN109886326B (en) * 2019-01-31 2022-01-04 深圳市商汤科技有限公司 Cross-modal information retrieval method and device and storage medium
JP2022510704A (en) * 2019-01-31 2022-01-27 シェンチェン センスタイム テクノロジー カンパニー リミテッド Cross-modal information retrieval methods, devices and storage media
TWI785301B (en) * 2019-01-31 2022-12-01 大陸商深圳市商湯科技有限公司 A cross-modal information retrieval method, device and storage medium
CN110210540B (en) * 2019-05-22 2021-02-26 山东大学 Cross-social media user identity recognition method and system based on attention mechanism
CN110210540A (en) * 2019-05-22 2019-09-06 山东大学 Across social media method for identifying ID and system based on attention mechanism
CN110706771A (en) * 2019-10-10 2020-01-17 复旦大学附属中山医院 Method and device for generating multi-mode education content, server and storage medium
CN111091010A (en) * 2019-11-22 2020-05-01 京东方科技集团股份有限公司 Similarity determination method, similarity determination device, network training device, network searching device and storage medium
CN111026894A (en) * 2019-12-12 2020-04-17 清华大学 Cross-modal image text retrieval method based on credibility self-adaptive matching network
CN111026894B (en) * 2019-12-12 2021-11-26 清华大学 Cross-modal image text retrieval method based on credibility self-adaptive matching network
CN111199750A (en) * 2019-12-18 2020-05-26 北京葡萄智学科技有限公司 Pronunciation evaluation method and device, electronic equipment and storage medium
CN111199750B (en) * 2019-12-18 2022-10-28 北京葡萄智学科技有限公司 Pronunciation evaluation method and device, electronic equipment and storage medium
CN110990597A (en) * 2019-12-19 2020-04-10 中国电子科技集团公司信息科学研究院 Cross-modal data retrieval system based on text semantic mapping and retrieval method thereof
CN110990597B (en) * 2019-12-19 2022-11-25 中国电子科技集团公司信息科学研究院 Cross-modal data retrieval system based on text semantic mapping and retrieval method thereof
CN113094550A (en) * 2020-01-08 2021-07-09 百度在线网络技术(北京)有限公司 Video retrieval method, device, equipment and medium
CN113094550B (en) * 2020-01-08 2023-10-24 百度在线网络技术(北京)有限公司 Video retrieval method, device, equipment and medium
CN111274445B (en) * 2020-01-20 2021-04-23 山东建筑大学 Similar video content retrieval method and system based on triple deep learning
CN111274445A (en) * 2020-01-20 2020-06-12 山东建筑大学 Similar video content retrieval method and system based on triple deep learning
CN111339256A (en) * 2020-02-28 2020-06-26 支付宝(杭州)信息技术有限公司 Method and device for text processing
CN111429913A (en) * 2020-03-26 2020-07-17 厦门快商通科技股份有限公司 Digit string voice recognition method, identity verification device and computer readable storage medium
CN111428072A (en) * 2020-03-31 2020-07-17 南方科技大学 Ophthalmologic multimodal image retrieval method, apparatus, server and storage medium
CN111639240A (en) * 2020-05-14 2020-09-08 山东大学 Cross-modal Hash retrieval method and system based on attention awareness mechanism
CN112001279A (en) * 2020-08-12 2020-11-27 山东省人工智能研究院 Cross-modal pedestrian re-identification method based on dual attribute information
CN111930992A (en) * 2020-08-14 2020-11-13 腾讯科技(深圳)有限公司 Neural network training method and device and electronic equipment
CN111930992B (en) * 2020-08-14 2022-10-28 腾讯科技(深圳)有限公司 Neural network training method and device and electronic equipment
CN112581387A (en) * 2020-12-03 2021-03-30 广州电力通信网络有限公司 Intelligent operation and maintenance system, device and method for power distribution room
CN113159371A (en) * 2021-01-27 2021-07-23 南京航空航天大学 Unknown target feature modeling and demand prediction method based on cross-modal data fusion
CN113159371B (en) * 2021-01-27 2022-05-20 南京航空航天大学 Unknown target feature modeling and demand prediction method based on cross-modal data fusion
CN112668671A (en) * 2021-03-15 2021-04-16 北京百度网讯科技有限公司 Method and device for acquiring pre-training model
CN112668671B (en) * 2021-03-15 2021-12-24 北京百度网讯科技有限公司 Method and device for acquiring pre-training model
CN113435206B (en) * 2021-05-26 2023-08-01 卓尔智联(武汉)研究院有限公司 Image-text retrieval method and device and electronic equipment
CN113204666B (en) * 2021-05-26 2022-04-05 杭州联汇科技股份有限公司 Method for searching matched pictures based on characters
CN113435206A (en) * 2021-05-26 2021-09-24 卓尔智联(武汉)研究院有限公司 Image-text retrieval method and device and electronic equipment
CN113204666A (en) * 2021-05-26 2021-08-03 杭州联汇科技股份有限公司 Method for searching matched pictures based on characters
CN113392196A (en) * 2021-06-04 2021-09-14 北京师范大学 Topic retrieval method and system based on multi-mode cross comparison
CN113434716A (en) * 2021-07-02 2021-09-24 泰康保险集团股份有限公司 Cross-modal information retrieval method and device
CN113434716B (en) * 2021-07-02 2024-01-26 泰康保险集团股份有限公司 Cross-modal information retrieval method and device
CN113934887B (en) * 2021-12-20 2022-03-15 成都考拉悠然科技有限公司 No-proposal time sequence language positioning method based on semantic decoupling
CN113934887A (en) * 2021-12-20 2022-01-14 成都考拉悠然科技有限公司 No-proposal time sequence language positioning method based on semantic decoupling
CN113971209B (en) * 2021-12-22 2022-04-19 松立控股集团股份有限公司 Non-supervision cross-modal retrieval method based on attention mechanism enhancement
CN113971209A (en) * 2021-12-22 2022-01-25 松立控股集团股份有限公司 Non-supervision cross-modal retrieval method based on attention mechanism enhancement
CN114417878B (en) * 2021-12-29 2023-04-18 北京百度网讯科技有限公司 Semantic recognition method and device, electronic equipment and storage medium
CN114417878A (en) * 2021-12-29 2022-04-29 北京百度网讯科技有限公司 Semantic recognition method and device, electronic equipment and storage medium
CN114529757A (en) * 2022-01-21 2022-05-24 四川大学 Cross-modal single-sample three-dimensional point cloud segmentation method
CN114529757B (en) * 2022-01-21 2023-04-18 四川大学 Cross-modal single-sample three-dimensional point cloud segmentation method
CN115858839A (en) * 2023-02-16 2023-03-28 上海蜜度信息技术有限公司 Cross-modal LOGO retrieval method, system, terminal and storage medium
CN116484878B (en) * 2023-06-21 2023-09-08 国网智能电网研究院有限公司 Semantic association method, device, equipment and storage medium of power heterogeneous data
CN116484878A (en) * 2023-06-21 2023-07-25 国网智能电网研究院有限公司 Semantic association method, device, equipment and storage medium of power heterogeneous data
CN116522168A (en) * 2023-07-04 2023-08-01 北京墨丘科技有限公司 Cross-modal text similarity comparison method and device and electronic equipment

Also Published As

Publication number Publication date
CN107562812B (en) 2021-01-15

Similar Documents

Publication Publication Date Title
CN107562812A (en) A kind of cross-module state similarity-based learning method based on the modeling of modality-specific semantic space
CN106295796B (en) entity link method based on deep learning
CN111061856B (en) Knowledge perception-based news recommendation method
CN110363282B (en) Network node label active learning method and system based on graph convolution network
CN111414461B (en) Intelligent question-answering method and system fusing knowledge base and user modeling
CN109299341A (en) One kind confrontation cross-module state search method dictionary-based learning and system
Wu et al. Dynamic graph convolutional network for multi-video summarization
CN107346328A (en) A kind of cross-module state association learning method based on more granularity hierarchical networks
CN109934261A (en) A kind of Knowledge driving parameter transformation model and its few sample learning method
CN113140254B (en) Meta-learning drug-target interaction prediction system and prediction method
CN113505204B (en) Recall model training method, search recall device and computer equipment
CN112988917B (en) Entity alignment method based on multiple entity contexts
CN111931505A (en) Cross-language entity alignment method based on subgraph embedding
CN110287323A (en) A kind of object-oriented sensibility classification method
CN111476261A (en) Community-enhanced graph convolution neural network method
CN105701225B (en) A kind of cross-media retrieval method based on unified association hypergraph specification
Chen et al. Binarized neural architecture search for efficient object recognition
CN112733602B (en) Relation-guided pedestrian attribute identification method
CN115952292B (en) Multi-label classification method, apparatus and computer readable medium
Moyano Learning network representations
CN108875034A (en) A kind of Chinese Text Categorization based on stratification shot and long term memory network
CN111222847A (en) Open-source community developer recommendation method based on deep learning and unsupervised clustering
Qi et al. Patent analytic citation-based vsm: Challenges and applications
CN115858919A (en) Learning resource recommendation method and system based on project field knowledge and user comments
CN115687760A (en) User learning interest label prediction method based on graph neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant