CN107562812A - A kind of cross-module state similarity-based learning method based on the modeling of modality-specific semantic space - Google Patents
A kind of cross-module state similarity-based learning method based on the modeling of modality-specific semantic space Download PDFInfo
- Publication number
- CN107562812A CN107562812A CN201710684763.6A CN201710684763A CN107562812A CN 107562812 A CN107562812 A CN 107562812A CN 201710684763 A CN201710684763 A CN 201710684763A CN 107562812 A CN107562812 A CN 107562812A
- Authority
- CN
- China
- Prior art keywords
- mrow
- msubsup
- msub
- text
- cross
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of cross-module state similarity-based learning method based on the modeling of modality-specific semantic space, comprise the following steps:1. cross-module state database is established, wherein comprising multiple modalities categorical data, and the data in database are divided into training set, test set and checking and collected.2. for every kind of modality type in cross-module state database, construction, by other modality type data projections to the semantic space, obtains the cross-module state similarity for the modality-specific for the semantic space of the modality-specific.3. the cross-module state similarity for modality-specific obtained from different modal semantic spaces is merged, final cross-module state similarity is obtained.4. any one modality type in test set is taken, using another modality type as target modalities, to calculate inquiry sample as inquiry mode and inquire about the similitude of target, the correlated results list of target modalities data is obtained according to similitude.The present invention can improve the accuracy rate of cross-module state retrieval.
Description
Technical field
The present invention relates to multimedia retrieval field, and in particular to a kind of cross-module state based on the modeling of modality-specific semantic space
Similarity-based learning method.
Background technology
Nowadays, the multi-modal data including image, video, text and audio is widely present on the internet, these
Multi-modal data is to aid in the basis of artificial intelligence cognition real world.Some research work are being attempted to break different modalities
Isomery wide gap between data, and cross-module state retrieval studying a question as one of focus, it is possible to achieve cross over different moulds
The information retrieval of state data, and there is extensive practical application request, such as search engine and digital library etc..Traditional
Single mode is retrieved, such as image retrieval, video frequency searching etc., is all confined to the form of single mode, can only be returned and inquire about and be identical
The retrieval result of modality type.It is different, the retrieval of cross-module state it is more convenient with it is useful, any modality type can be passed through
Query and search obtain the retrieval result of different modalities.
A major challenge of cross-module state retrieval is how to tackle the inconsistency of different modalities, and learns inherent pass therebetween
Connection.Because different modalities data have diversified representation and distribution character, and it is empty to be dispersed in respective feature
Between, this isomery characteristic make it that the similitude measured between different modalities is very difficult, such as piece image and a section audio it
Between similitude.In view of the above-mentioned problems, researcher proposes certain methods, the character representation of different modalities data is projected
Same uniform spaces learn Unified Characterization, so as to which the similitude between different modalities data corresponding can be united by calculating its
One characterize between distance obtain.Conventional method for different modalities data by learning mapping matrix to maximize pass therebetween
Connection, such as different modalities are analyzed by canonical correlation analysis (Canonical Correlation Analysis, abbreviation CCA)
Paired incidence relation between data, different modalities data are mapped to the public subspace of same dimension.In addition, Zhai et al.
In document " Learning Cross-Media Joint Representation with Sparse and Semi-
The method based on figure stipulations is proposed in Supervised Regularization ", is different modalities data configuration graph model,
Cross-module state association study is carried out simultaneously and high-level semantic is abstracted.
In recent years, the huge progress that deep learning obtains is promoted researcher and different moulds is modeled using deep neural network
Incidence relation, Feng et al. between state data is in document " Cross-modal Retrieval with Correspondence
Corresponding self-encoding encoder (Correspondence Autoencoder, abbreviation Corr-AE) is proposed in Autoencoder ", passes through structure
The connected network structure of two-way is built, while models the incidence relation and reconstruction information of different modalities data.Peng et al. is in document
“Cross-media shared representation by hierarchical learning with multiple
Deep networks " propose cross-module state Multi net voting structural model (Cross-media Multiple Deep Network, letter
Claim CMDN), it represents that the study stage models between semantic information and different modalities in mode simultaneously in single-mode separation
Related information, then in Unified Characterization study stage structure multitiered network structure, fusion single mode semantic abstraction represents and single mode
State association represents, and the mode learnt using stacking models reconstruction simultaneously and related information learns to obtain cross-module state Unified Characterization.
But above-mentioned existing method is mostly comparably to throw the data of different modalities by mapping matrix or depth model
Uniform spaces are mapped to excavate potential alignment relation therebetween, it means that the information excavated from different modal datas is equivalent
's.But in general, different modalities data, such as image and text, relation therebetween is often unequal and complementary.When
When they describe same semantic jointly, the information of inequality may be included, because information exclusive inside some mode is not
Content that can be well with the statement of other mode is alignd.Therefore, it is potential to excavate comparably to treat different modalities data
Fine granularity alignment content simultaneously builds a uniform spaces, can lose information exclusive and useful in mode, and can not make full use of
The abundant internal information that every kind of mode provides.
The content of the invention
In view of the shortcomings of the prior art, the present invention proposes a kind of cross-module state phase based on the modeling of modality-specific semantic space
Like sexology learning method, construction is trained circulation notice network to the modality-specific data, built for the semantic space of modality-specific
Fine granularity information and spatial context information inside mould mode, study is then associated by the joint based on notice mechanism
Other modal datas are projected to the semantic space of the mode, fully learn unbalanced related information between different modalities, most
The cross-module state similarity for modality-specific obtained from different modal semantic spaces is carried out using the mode of dynamic fusion afterwards
Fusion, the complementarity of different modalities semantic space is further excavated, improve the accuracy rate of cross-module state retrieval.
To achieve the above objectives, the technical solution adopted by the present invention is as follows:
A kind of cross-module state similarity-based learning method based on the modeling of modality-specific semantic space, specific mould is directed to for constructing
The semantic space of state, and the cross-module state similarity for modality-specific obtained from different modal semantic spaces is merged,
The similarity of different modalities data is obtained, so as to realize that cross-module state is retrieved, is comprised the following steps, wherein step (1)-(3) obtain
Cross-module state similarity, step (4) further realize that cross-module state is retrieved:
(1) cross-module state database is established, wherein including the data of multiple modalities type;
(2) the every kind of modality type being directed in cross-module state database, construction is directed to the semantic space of the modality-specific, by it
His modality type data projection obtains the cross-module state similarity for the modality-specific to the semantic space;
(3) the cross-module state similarity for modality-specific obtained from the semantic space of different modalities is merged, obtained
To final cross-module state similarity;
(4) any one modality type is used, using another modality type as target modalities, will to be looked into as inquiry mode
Each data of inquiry mode, which are used as, inquires about sample, the data in searched targets mode, calculates inquiry sample and the phase of inquiry target
Like property, the correlated results list of target modalities data is obtained according to similitude.
Further, a kind of above-mentioned cross-module state similarity-based learning method based on the modeling of modality-specific semantic space, the step
Suddenly (1) cross-module state database can include multiple modalities type, such as image, text etc..
Further, a kind of above-mentioned cross-module state similarity-based learning method based on the modeling of modality-specific semantic space, the step
Suddenly the semantic space building method for modality-specific of (2), the modality-specific data are trained with circulation notice network, then
Other modality type data projections are obtained to the semantic space of the mode by the joint association study based on notice mechanism
For the cross-module state similarity of the modality-specific.
Further, a kind of above-mentioned cross-module state similarity-based learning method based on the modeling of modality-specific semantic space, the step
Suddenly (3) mid-span mode similarity learning method, it is directed to using the mode of dynamic fusion by what is obtained from different modal semantic spaces
The cross-module state similarity of modality-specific is merged.
Further, a kind of above-mentioned cross-module state similarity-based learning method based on the modeling of modality-specific semantic space, the step
Suddenly the retrieval mode of (4) is that, using a kind of modality type as inquiry mode, another modality type is as target modalities.
Each data for inquiring about mode are used as inquiry sample, after similitude is calculated according to step (3), with target modalities
All data calculate similitude, are then sorted from big to small according to similitude, obtain correlated results list.
Effect of the invention is that:Compared with the conventional method, this method is directed to the semantic space of modality-specific by constructing,
The fine granularity information and spatial context information inside mode can be fully modeled, then passes through the connection based on notice mechanism
Association study is closed, fully learns unbalanced related information between different modalities, it is finally further using the mode of dynamic fusion
The complementarity of different modalities semantic space is excavated, improves the accuracy rate of cross-module state retrieval.
Why this method has foregoing invention effect, and its reason is:For the semantic space of modality-specific, to the spy
Determine modal data training circulation notice network, model the fine granularity information and spatial context information inside mode, then
By the joint association study based on notice mechanism by the semantic space of other modality type data projections to the mode, fully
Unbalanced related information between study different modalities, will be from different modal semantic spaces finally using the mode of dynamic fusion
The obtained cross-module state similarity for modality-specific is merged, and further excavates the complementarity of different modalities semantic space,
Improve the accuracy rate of cross-module state retrieval.
Brief description of the drawings
Fig. 1 is a kind of cross-module state similarity-based learning method flow based on the modeling of modality-specific semantic space of the present invention
Figure.
Fig. 2 is the schematic diagram of the complete network structure of the present invention.
Embodiment
The present invention is described in further detail with specific embodiment below in conjunction with the accompanying drawings.
A kind of cross-module state similarity-based learning method based on the modeling of modality-specific semantic space of the present invention, its flow is as schemed
Shown in 1, comprise the steps of:
(1) cross-module state database is established, wherein including the data of multiple modalities type, and the data in database are divided into
Training set, test set and checking collection.
In the present embodiment, the cross-module state database can include multiple modalities type, including image, text.
Cross-module state data set, D={ D are represented with D(i),D(t), wherein
For medium type r, wherein r=i, t (i represents image, and t represents text), n is defined(r)For its data amount check.Instruction
Each data that white silk is concentrated have and an only semantic classes.
DefinitionFor the characteristic vector of p-th of data in medium type r, it represents that structure is a d(r)× 1 to
Amount, wherein d(r)Presentation medium type r characteristic vector dimension.
DefinitionSemantic label be set toIt represents the vector that structure is c × 1, and wherein c represents semantic classes
Total amount.In have and it is only one-dimensional be 1, remaining is 0, represents the semantic classes of the data for mark that value is corresponding to 1 row
Label.
(2) the every kind of modality type being directed in cross-module state database, construction is directed to the semantic space of the modality-specific, by it
His modality type data projection obtains the cross-module state similarity for the modality-specific to the semantic space.
The process of the step is as shown in Figure 2.In the present embodiment, for image, semantic spatial configuration, circulation notice is used
Network model modeled images data, zoom to 256 × 256, and be input in convolutional neural networks by original image first.Then
From convolutional neural networks, last pond layer (pooling layer) is the different respective character representation of extracted region of imageAnd the regional in an image is organized into a sequence in order, use LSTM (Long-Short Term
Memory, shot and long term memory) spatial context information between neural net model establishing different images region, its sequence exported can
To be expressed asTraining pattern is set to focus on prior image-region followed by notice mechanism, specifically
Ground, fully-connected network and Softmax active coatings are constructed, passes through equation below computation vision notice weight:
WhereinWithFor the network parameter of each layer, and aiInclude the visual attention weight of different zones in image.Cause
This, the characteristic vector in n-th of region can be expressed as in an image(in image, semantic space in Fig. 2It is shown),
Local the fine granularity information and spatial context information of image are contained simultaneously.In next step, text data is projected into image
Semantic space learns to carry out the association of cross-module state.Specifically, the term vector that k dimensions are first extracted for each word in text data is special
Sign, a text then comprising n word can be expressed as n × k matrix, be input to text convolutional neural networks and obtain the sentence
The character representation of wordsThen image ipWith text tpCross-module state similarity in image, semantic space is defined as follows (in such as Fig. 2
In image, semantic spaceIt is shown):
WhereinRepresent image ipIn j-th of provincial characteristics vector.Loss function realization is finally defined as follows to be based on
The association study of notice:
Two of above-mentioned formula are defined respectively as:
WhereinThe picture/text pair of matching is represented,WithRepresent unmatched picture/text pair, α
It is boundary parameter, and N represents the triple number of sampling.So far, can be obtained for image modalities from image, semantic space
Cross-module state similarity simi, expression study and measuring similarity learning process are incorporated, while fully modeled inside image
Unbalanced related information between fine granularity information and different modalities.
It is first using circulation notice network model modeling text data for text semantic spatial configuration in the present embodiment
For each text data, the term vector feature of k dimensions is extracted for wherein each word, then text comprising n word can be with
N × k matrix is expressed as, is input to text convolutional neural networks, and from last pond layer (pooling layer) of network
Extract the character representation of different text blocks.Then it is successively inputted in LSTM neutral nets, to model the context of text letter
Breath, its sequence exported can be expressed asTraining pattern is focused on followed by notice mechanism heavier
The text fragments wanted, specifically, construct fully-connected network and Softmax active coatings, and text notice is calculated by equation below
Weight:
WhereinWithFor the network parameter of each layer, and atInclude the text notice weight of different fragments in text.Cause
This, the characteristic vector of m-th of fragment can be expressed as in a text(in Fig. 2 Chinese version semantic spacesInstitute
Show), while contain local the fine granularity information and spatial context information of text.In next step, view data is projected
Text semantic space learns to carry out the association of cross-module state.Specifically, first the overall feature of image is extracted using convolutional neural networks
RepresentThen image ipWith text tpCross-module state similarity in text semantic space is defined as follows (such as Fig. 2 Chinese versions semanteme
In spaceIt is shown):
WhereinRepresent text tpIn j-th of segment characterizations vector.Loss function realization is finally defined as follows to be based on
The association study of notice:
Two of above-mentioned formula are defined respectively as:
WhereinThe picture/text pair of matching is represented,WithRepresent unmatched picture/text pair, β
It is boundary parameter, and M represents the triple number of sampling.So far, can obtain being directed to text modality from text semantic space
Cross-module state similarity simt, expression study and measuring similarity learning process are incorporated, while fully modeled inside text
Fine granularity information and different modalities between unbalanced related information.
(3) the cross-module state similarity for modality-specific obtained from different modal semantic spaces is merged, obtained
Final cross-module state similarity.
In the present embodiment, using dynamic fusion mode by from different modal semantic spaces obtain for modality-specific
Cross-module state similarity is merged.First, it is the cross-module state for modality-specific obtained from different modal semantic spaces is similar
Degree is normalized between 0 to 1 according to formula below:
Then, for picture/text to (ip,tp), obtain branch's conduct after normalization is calculated from image, semantic space
The picture/text is to the changeable weight in text space, and the branch that obtains after normalization is calculated from text semantic space makees
It is the picture/text to the changeable weight in image space.Therefore, final cross-module state similarity is defined as follows:
Sim(ip,tp)=rt(ip,tp)·simi(ip,tp)+ri(ip,tp)·simt(ip,tp)
The complementarity of different modalities semantic space can be fully excavated, and further lifts the effect of cross-module state retrieval.
(4) any one modality type in test set is used to be used as mesh using another modality type as inquiry mode
Mark mode.Using each data for inquiring about mode as inquiring about sample, the data in searched targets mode, according in step (3)
Mode, calculate inquiry sample and inquire about the similitude of target, by similitude according to sorting from big to small, obtain target modalities data
Correlated results list.
It is following test result indicates that, compared with the conventional method, cross-module state of the present invention based on more granularity hierarchical networks is closed
Join learning method, higher retrieval rate can be obtained.
The present embodiment employs Wikipedia cross-module state data sets and tested, and the data set is by document " A New
Approach to Cross-Modal Multimedia Retrieval " (author N.Rasiwasia, J.Pereira,
E.Coviello, G.Doyle, G.Lanckriet, R.Levy and N.Vasconcelos, it is published in the ACM of 2010 years
International conference on Multimedia) propose, including 2866 sections of texts and 2866 images, and
Text and image are one-to-one, are always divided into 10 classifications, wherein 2173 sections of texts and 2173 images are as training set,
231 sections of texts and 231 images collect as checking, and 492 sections of texts and 492 images are as test set.Test following 3 kinds of sides
Method is as Experimental comparison:
Existing method one:Document " Learning Cross-Media Joint Representation with Sparse
Joint in and Semi-Supervised Regularization " (author X.Zhai, Y.Peng, and J.Xiao) represents
Learn (Joint Representation Learning, abbreviation JRL) method, build graph model for different modalities data, simultaneously
Carry out cross-module state association study and high-level semantic is abstracted, and introduce sparse and semi-supervised stipulations.
Existing method two:Document " Cross-modal Retrieval with Correspondence
Corresponding self-encoding encoder network (Correspondence in Autoencoder " (author F.Feng, X.Wang, and R.Li)
Autoencoder, abbreviation Corr-AE) method, two road networks are constructed, and be connected to model related information simultaneously in intermediate layer
With reconstruction information.
Existing method three:Document " Cross-media shared representation by hierarchical
Cross-module state in learning with multiple deep networks " (author Y.Peng, X.Huang, and J.Qi)
Multi net voting structure (Cross-media Multiple Deep Network, abbreviation CMDN), study rank is represented in single-mode separation
Section models the related information between semantic information and different modalities in mode simultaneously, then learns stage structure in Unified Characterization
Build multitiered network structure, and using the mode of stacking study model reconstruction simultaneously and related information learns to obtain cross-module state and unifies table
Sign.
The present invention:The method of the present embodiment.
Experiment evaluates and tests cross-module state using conventional MAP (the mean average precision) indexs of information retrieval field
The accuracy of retrieval, MAP refer to the average value of each inquiry sample retrieval accuracy, and MAP value is bigger, illustrates the retrieval of cross-module state
As a result it is better.
The Experimental results show of the present invention of table 1.
Image querying text | Text query image | It is average | |
Existing method one | 0.479 | 0.428 | 0.454 |
Existing method two | 0.442 | 0.429 | 0.436 |
Existing method three | 0.487 | 0.427 | 0.457 |
The present invention | 0.516 | 0.458 | 0.487 |
As it can be seen from table 1 the present invention compares existing method in two image querying text, text query image tasks
Achieve larger raising.Existing method one builds graph model under conventional frame and different modalities data is linearly mapped to unified sky
Between, it is difficult to the fully complicated cross-module state incidence relation of modeling.Existing method two and existing method three use depth network structure,
But the data of different modalities are comparably projected into uniform spaces to excavate potential alignment association therebetween by depth model,
Information exclusive and useful in mode can be lost, and the internal information that every kind of mode can not be made full use of to provide.A side of the invention
Surface construction is directed to the semantic space of modality-specific, models fine granularity information and spatial context information inside mode, simultaneously
Fully unbalanced related information between study different modalities.On the other hand, will be from different modalities using the mode of dynamic fusion
The cross-module state similarity for modality-specific that semantic space obtains is merged, and further excavates different modalities semantic space
Complementarity, so as to improve the accuracy rate of cross-module state retrieval.
In other embodiments, the method for the construction modality-specific semantic space in step (2) of the present invention, uses LSTM
The contextual information of (Long-Short Term Memory, shot and long term memory) neural net model establishing image and text data, together
Sample can use Recognition with Recurrent Neural Network (Recurrent Neural Network, abbreviation RNN) as replacement.
Obviously, those skilled in the art can carry out the essence of various changes and modification without departing from the present invention to the present invention
God and scope.So, if these modifications and variations of the present invention belong to the scope of the claims in the present invention and its equivalent technologies
Within, then the present invention is also intended to comprising including these changes and modification.
Claims (8)
1. a kind of cross-module state similarity-based learning method based on the modeling of modality-specific semantic space, comprises the following steps:
(1) cross-module state database is established, wherein including the data of multiple modalities type;
(2) the every kind of modality type being directed in cross-module state database, construction is directed to the semantic space of the modality-specific, by other moulds
State categorical data projects the semantic space, obtains the cross-module state similarity for the modality-specific;
(3) the cross-module state similarity for modality-specific obtained from the semantic space of different modalities is merged, obtained most
Whole cross-module state similarity.
2. the method as described in claim 1, it is characterised in that the cross-module state database includes multiple modalities type, described
Multiple modalities type includes image, text.
3. the method as described in claim 1, it is characterised in that the semantic space for modality-specific in step (2) constructs
Method is:To the data training circulation notice network of the modality-specific, then associated by the joint based on notice mechanism
By the semantic space of the data projection of other modality types to the mode, the cross-module state for obtaining being directed to the modality-specific is similar for study
Degree.
4. method as claimed in claim 3, it is characterised in that the building method in image, semantic space is:
A) by original image and it is input in convolutional neural networks;
B) from convolutional neural networks, last pond layer is the different respective character representation of extracted region of image
And the regional in an image is organized into a sequence in order, built using LSTM neutral nets or RNN neutral nets
Spatial context information between mould different images region, its sequence exported are expressed as
C) training pattern is focused on important image-region using notice mechanism, first construct fully-connected network and
Softmax active coatings, then pass through equation below computation vision notice weight:
<mrow>
<msup>
<mi>M</mi>
<mi>i</mi>
</msup>
<mo>=</mo>
<mi>tanh</mi>
<mrow>
<mo>(</mo>
<msubsup>
<mi>W</mi>
<mi>a</mi>
<mi>i</mi>
</msubsup>
<msub>
<mi>H</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>,</mo>
</mrow>
<mrow>
<msup>
<mi>a</mi>
<mi>i</mi>
</msup>
<mo>=</mo>
<mi>s</mi>
<mi>o</mi>
<mi>f</mi>
<mi>t</mi>
<mi>m</mi>
<mi>a</mi>
<mi>x</mi>
<mrow>
<mo>(</mo>
<msubsup>
<mi>w</mi>
<mrow>
<mi>i</mi>
<mi>a</mi>
</mrow>
<mi>T</mi>
</msubsup>
<msub>
<mi>M</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>,</mo>
</mrow>
WhereinWithFor the network parameter of each layer, and aiInclude the visual attention weight of different zones in image, therefore, one
The characteristic vector in n-th of region is expressed as in individual imageLocal fine granularity information and the space of image are contained simultaneously
Contextual information;
D) text data is projected into image, semantic space to carry out the association study of cross-module state, first to be each in text data
The term vector feature of word extraction k dimensions, the matrix that a text representation then comprising n word is n × k, is input to text convolution
Neutral net obtains the character representation of wordThen image i is definedpWith text tpCross-module state phase in image, semantic space
It is as follows like spending:
<mrow>
<msub>
<mi>sim</mi>
<mi>i</mi>
</msub>
<mrow>
<mo>(</mo>
<msub>
<mi>i</mi>
<mi>p</mi>
</msub>
<mo>,</mo>
<msub>
<mi>t</mi>
<mi>p</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>j</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>n</mi>
</munderover>
<msubsup>
<mi>a</mi>
<mi>j</mi>
<msub>
<mi>i</mi>
<mi>p</mi>
</msub>
</msubsup>
<msubsup>
<mi>h</mi>
<mi>j</mi>
<msub>
<mi>i</mi>
<mi>p</mi>
</msub>
</msubsup>
<mo>&CenterDot;</mo>
<msubsup>
<mi>q</mi>
<mi>p</mi>
<mi>t</mi>
</msubsup>
<mo>,</mo>
</mrow>
WhereinRepresent image ipIn j-th of provincial characteristics vector;
E) it is defined as follows loss function and realizes the association study based on notice:
<mrow>
<msub>
<mi>L</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<munderover>
<mi>&Sigma;</mi>
<mrow>
<mi>n</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>N</mi>
</munderover>
<msub>
<mi>l</mi>
<mrow>
<mi>i</mi>
<mn>1</mn>
</mrow>
</msub>
<mrow>
<mo>(</mo>
<msubsup>
<mi>i</mi>
<mi>n</mi>
<mo>+</mo>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>t</mi>
<mi>n</mi>
<mo>+</mo>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>t</mi>
<mi>n</mi>
<mo>-</mo>
</msubsup>
<mo>)</mo>
</mrow>
<mo>+</mo>
<msub>
<mi>l</mi>
<mrow>
<mi>i</mi>
<mn>2</mn>
</mrow>
</msub>
<mrow>
<mo>(</mo>
<msubsup>
<mi>t</mi>
<mi>n</mi>
<mo>+</mo>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>i</mi>
<mi>n</mi>
<mo>+</mo>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>i</mi>
<mi>n</mi>
<mo>-</mo>
</msubsup>
<mo>)</mo>
</mrow>
<mo>,</mo>
</mrow>
Two in above-mentioned formula are defined respectively as:
<mrow>
<msub>
<mi>l</mi>
<mrow>
<mi>i</mi>
<mn>1</mn>
</mrow>
</msub>
<mrow>
<mo>(</mo>
<msubsup>
<mi>i</mi>
<mi>n</mi>
<mo>+</mo>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>t</mi>
<mi>n</mi>
<mo>+</mo>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>t</mi>
<mi>n</mi>
<mo>-</mo>
</msubsup>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mi>m</mi>
<mi>a</mi>
<mi>x</mi>
<mrow>
<mo>(</mo>
<mn>0</mn>
<mo>,</mo>
<mi>&alpha;</mi>
<mo>+</mo>
<msub>
<mi>sim</mi>
<mi>i</mi>
</msub>
<mo>(</mo>
<mrow>
<msubsup>
<mi>i</mi>
<mi>n</mi>
<mo>+</mo>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>t</mi>
<mi>n</mi>
<mo>+</mo>
</msubsup>
</mrow>
<mo>)</mo>
<mo>-</mo>
<msub>
<mi>sim</mi>
<mi>i</mi>
</msub>
<mo>(</mo>
<mrow>
<msubsup>
<mi>i</mi>
<mi>n</mi>
<mo>+</mo>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>t</mi>
<mi>n</mi>
<mo>-</mo>
</msubsup>
</mrow>
<mo>)</mo>
<mo>)</mo>
</mrow>
<mo>,</mo>
</mrow>
<mrow>
<msub>
<mi>l</mi>
<mrow>
<mi>i</mi>
<mn>2</mn>
</mrow>
</msub>
<mrow>
<mo>(</mo>
<msubsup>
<mi>t</mi>
<mi>n</mi>
<mo>+</mo>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>i</mi>
<mi>n</mi>
<mo>+</mo>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>i</mi>
<mi>n</mi>
<mo>-</mo>
</msubsup>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mi>m</mi>
<mi>a</mi>
<mi>x</mi>
<mrow>
<mo>(</mo>
<mn>0</mn>
<mo>,</mo>
<mi>&alpha;</mi>
<mo>+</mo>
<msub>
<mi>sim</mi>
<mi>i</mi>
</msub>
<mo>(</mo>
<mrow>
<msubsup>
<mi>i</mi>
<mi>n</mi>
<mo>+</mo>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>t</mi>
<mi>n</mi>
<mo>+</mo>
</msubsup>
</mrow>
<mo>)</mo>
<mo>-</mo>
<msub>
<mi>sim</mi>
<mi>i</mi>
</msub>
<mo>(</mo>
<mrow>
<msubsup>
<mi>i</mi>
<mi>n</mi>
<mo>-</mo>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>t</mi>
<mi>n</mi>
<mo>+</mo>
</msubsup>
</mrow>
<mo>)</mo>
<mo>)</mo>
</mrow>
<mo>,</mo>
</mrow>
WhereinThe picture/text pair of matching is represented,WithUnmatched picture/text pair is represented, α is side
Boundary's parameter, and N represents the triple number of sampling.
5. method as claimed in claim 4, it is characterised in that the building method in text semantic space is:
A) for each text data, the term vector feature of k dimensions is extracted for wherein each word, then a text for including n word
Originally n × k matrix is expressed as, is input to text convolutional neural networks;
B) from the character representation of last different text block of pond layer extraction of convolutional neural networks, then it is successively inputted to
In LSTM neutral nets or RNN neutral nets, to model the contextual information of text, its sequence exported is expressed as
C) training pattern is focused on important text fragments using notice mechanism, first construct fully-connected network and
Softmax active coatings, text notice weight is then calculated by equation below:
<mrow>
<msup>
<mi>M</mi>
<mi>t</mi>
</msup>
<mo>=</mo>
<mi>tanh</mi>
<mrow>
<mo>(</mo>
<msubsup>
<mi>W</mi>
<mi>a</mi>
<mi>t</mi>
</msubsup>
<msub>
<mi>H</mi>
<mi>t</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>,</mo>
</mrow>
<mrow>
<msup>
<mi>a</mi>
<mi>t</mi>
</msup>
<mo>=</mo>
<mi>s</mi>
<mi>o</mi>
<mi>f</mi>
<mi>t</mi>
<mi>m</mi>
<mi>a</mi>
<mi>x</mi>
<mrow>
<mo>(</mo>
<msubsup>
<mi>w</mi>
<mrow>
<mi>t</mi>
<mi>a</mi>
</mrow>
<mi>T</mi>
</msubsup>
<msub>
<mi>M</mi>
<mi>t</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>,</mo>
</mrow>
WhereinWithFor the network parameter of each layer, and atInclude the text notice weight of different fragments in text, therefore one
The characteristic vector of m-th of fragment is expressed as in individual textLocal fine granularity information and the space of text are contained simultaneously
Contextual information;
D) view data is projected into text semantic space to carry out the association study of cross-module state, carried first by convolutional neural networks
Take the overall character representation of imageThen image i is definedpWith text tpText semantic space cross-module state similarity such as
Under:
<mrow>
<msub>
<mi>sim</mi>
<mi>t</mi>
</msub>
<mrow>
<mo>(</mo>
<msub>
<mi>i</mi>
<mi>p</mi>
</msub>
<mo>,</mo>
<msub>
<mi>t</mi>
<mi>p</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>j</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>m</mi>
</munderover>
<msubsup>
<mi>a</mi>
<mi>j</mi>
<msub>
<mi>t</mi>
<mi>p</mi>
</msub>
</msubsup>
<msubsup>
<mi>h</mi>
<mi>j</mi>
<msub>
<mi>t</mi>
<mi>p</mi>
</msub>
</msubsup>
<mo>&CenterDot;</mo>
<msubsup>
<mi>q</mi>
<mi>p</mi>
<mi>i</mi>
</msubsup>
<mo>,</mo>
</mrow>
WhereinRepresent text tpIn j-th of segment characterizations vector;
E) it is defined as follows loss function and realizes the association study based on notice:
<mrow>
<msub>
<mi>L</mi>
<mi>t</mi>
</msub>
<mo>=</mo>
<munderover>
<mi>&Sigma;</mi>
<mrow>
<mi>n</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>M</mi>
</munderover>
<msub>
<mi>l</mi>
<mrow>
<mi>t</mi>
<mn>1</mn>
</mrow>
</msub>
<mrow>
<mo>(</mo>
<msubsup>
<mi>t</mi>
<mi>n</mi>
<mo>+</mo>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>i</mi>
<mi>n</mi>
<mo>+</mo>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>i</mi>
<mi>n</mi>
<mo>-</mo>
</msubsup>
<mo>)</mo>
</mrow>
<mo>+</mo>
<msub>
<mi>l</mi>
<mrow>
<mi>t</mi>
<mn>2</mn>
</mrow>
</msub>
<mrow>
<mo>(</mo>
<msubsup>
<mi>i</mi>
<mi>n</mi>
<mo>+</mo>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>t</mi>
<mi>n</mi>
<mo>+</mo>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>t</mi>
<mi>n</mi>
<mo>-</mo>
</msubsup>
<mo>)</mo>
</mrow>
<mo>,</mo>
</mrow>
Two in above-mentioned formula are defined respectively as:
<mrow>
<msub>
<mi>l</mi>
<mrow>
<mi>t</mi>
<mn>1</mn>
</mrow>
</msub>
<mrow>
<mo>(</mo>
<msubsup>
<mi>t</mi>
<mi>n</mi>
<mo>+</mo>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>i</mi>
<mi>n</mi>
<mo>+</mo>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>i</mi>
<mi>n</mi>
<mo>-</mo>
</msubsup>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mi>m</mi>
<mi>a</mi>
<mi>x</mi>
<mrow>
<mo>(</mo>
<mn>0</mn>
<mo>,</mo>
<mi>&beta;</mi>
<mo>+</mo>
<msub>
<mi>sim</mi>
<mi>t</mi>
</msub>
<mo>(</mo>
<mrow>
<msubsup>
<mi>i</mi>
<mi>n</mi>
<mo>+</mo>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>t</mi>
<mi>n</mi>
<mo>+</mo>
</msubsup>
</mrow>
<mo>)</mo>
<mo>-</mo>
<msub>
<mi>sim</mi>
<mi>t</mi>
</msub>
<mo>(</mo>
<mrow>
<msubsup>
<mi>i</mi>
<mi>n</mi>
<mo>-</mo>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>t</mi>
<mi>n</mi>
<mo>+</mo>
</msubsup>
</mrow>
<mo>)</mo>
<mo>)</mo>
</mrow>
<mo>,</mo>
</mrow>
<mrow>
<msub>
<mi>l</mi>
<mrow>
<mi>t</mi>
<mn>2</mn>
</mrow>
</msub>
<mrow>
<mo>(</mo>
<msubsup>
<mi>i</mi>
<mi>n</mi>
<mo>+</mo>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>t</mi>
<mi>n</mi>
<mo>+</mo>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>t</mi>
<mi>n</mi>
<mo>-</mo>
</msubsup>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mi>m</mi>
<mi>a</mi>
<mi>x</mi>
<mrow>
<mo>(</mo>
<mn>0</mn>
<mo>,</mo>
<mi>&beta;</mi>
<mo>+</mo>
<msub>
<mi>sim</mi>
<mi>t</mi>
</msub>
<mo>(</mo>
<mrow>
<msubsup>
<mi>i</mi>
<mi>n</mi>
<mo>+</mo>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>t</mi>
<mi>n</mi>
<mo>+</mo>
</msubsup>
</mrow>
<mo>)</mo>
<mo>-</mo>
<msub>
<mi>sim</mi>
<mi>t</mi>
</msub>
<mo>(</mo>
<mrow>
<msubsup>
<mi>i</mi>
<mi>n</mi>
<mo>+</mo>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>t</mi>
<mi>n</mi>
<mo>-</mo>
</msubsup>
</mrow>
<mo>)</mo>
<mo>)</mo>
</mrow>
<mo>,</mo>
</mrow>
WhereinThe picture/text pair of matching is represented,WithUnmatched picture/text pair is represented, β is side
Boundary's parameter, and M represents the triple number of sampling.
6. the method as described in claim 1, it is characterised in that step (3) will be from different modalities using the mode of dynamic fusion
The cross-module state similarity for modality-specific that semantic space obtains is merged, and is comprised the following steps:First, will be from different moulds
The cross-module state similarity for modality-specific that state semantic space obtains is normalized between 0 to 1 according to formula below:
<mrow>
<msub>
<mi>r</mi>
<mi>i</mi>
</msub>
<mrow>
<mo>(</mo>
<msub>
<mi>i</mi>
<mi>p</mi>
</msub>
<mo>,</mo>
<msub>
<mi>t</mi>
<mi>p</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mrow>
<msub>
<mi>sim</mi>
<mi>i</mi>
</msub>
<mrow>
<mo>(</mo>
<msub>
<mi>i</mi>
<mi>p</mi>
</msub>
<mo>,</mo>
<msub>
<mi>t</mi>
<mi>p</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>m</mi>
<mi>i</mi>
<mi>n</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>sim</mi>
<mi>i</mi>
</msub>
<mo>(</mo>
<mrow>
<mi>i</mi>
<mo>,</mo>
<mi>t</mi>
</mrow>
<mo>)</mo>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<mi>max</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>sim</mi>
<mi>i</mi>
</msub>
<mo>(</mo>
<mrow>
<mi>i</mi>
<mo>,</mo>
<mi>t</mi>
</mrow>
<mo>)</mo>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>m</mi>
<mi>i</mi>
<mi>n</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>sim</mi>
<mi>i</mi>
</msub>
<mo>(</mo>
<mrow>
<mi>i</mi>
<mo>,</mo>
<mi>t</mi>
</mrow>
<mo>)</mo>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
<mo>,</mo>
</mrow>
<mrow>
<msub>
<mi>r</mi>
<mi>t</mi>
</msub>
<mrow>
<mo>(</mo>
<mrow>
<msub>
<mi>i</mi>
<mi>p</mi>
</msub>
<mo>,</mo>
<msub>
<mi>t</mi>
<mi>p</mi>
</msub>
</mrow>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mrow>
<msub>
<mi>sim</mi>
<mi>t</mi>
</msub>
<mrow>
<mo>(</mo>
<mrow>
<msub>
<mi>i</mi>
<mi>p</mi>
</msub>
<mo>,</mo>
<msub>
<mi>t</mi>
<mi>p</mi>
</msub>
</mrow>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>min</mi>
<mrow>
<mo>(</mo>
<mrow>
<msub>
<mi>sim</mi>
<mi>t</mi>
</msub>
<mrow>
<mo>(</mo>
<mrow>
<mi>i</mi>
<mo>,</mo>
<mi>t</mi>
</mrow>
<mo>)</mo>
</mrow>
</mrow>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<mi>max</mi>
<mrow>
<mo>(</mo>
<mrow>
<msub>
<mi>sim</mi>
<mi>t</mi>
</msub>
<mrow>
<mo>(</mo>
<mrow>
<mi>i</mi>
<mo>,</mo>
<mi>t</mi>
</mrow>
<mo>)</mo>
</mrow>
</mrow>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>min</mi>
<mrow>
<mo>(</mo>
<mrow>
<msub>
<mi>sim</mi>
<mi>t</mi>
</msub>
<mrow>
<mo>(</mo>
<mrow>
<mi>i</mi>
<mo>,</mo>
<mi>t</mi>
</mrow>
<mo>)</mo>
</mrow>
</mrow>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
<mo>,</mo>
</mrow>
Then, for picture/text to (ip,tp), the score after normalization is calculated as the figure from image, semantic space
Picture/text is to the changeable weight in text space, and the branch that obtains after normalization is calculated from text semantic space is used as this
Picture/text is to the changeable weight in image space;Final cross-module state similarity is defined as follows:
Sim(ip,tp)=rt(ip,tp)·simi(ip,tp)+ri(ip,tp)·simt(ip,tp)。
7. a kind of cross-module state search method, comprises the following steps:
1) any claim methods described in claim 1 to 6 is used to calculate cross-module state similarity;
2) a kind of modality type is used, using another modality type as target modalities, mode will to be inquired about as inquiry mode
Each data as the data in inquiry sample searched targets mode, inquire about sample and inquire about the similitude of target by calculating, according to
Similitude obtains the retrieval result of target modalities data.
8. method as claimed in claim 7, it is characterised in that step 2) is calculated inquiry sample and inquires about the similar of target
After property, sorted from big to small according to similitude, obtain correlated results list.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710684763.6A CN107562812B (en) | 2017-08-11 | 2017-08-11 | Cross-modal similarity learning method based on specific modal semantic space modeling |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710684763.6A CN107562812B (en) | 2017-08-11 | 2017-08-11 | Cross-modal similarity learning method based on specific modal semantic space modeling |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107562812A true CN107562812A (en) | 2018-01-09 |
CN107562812B CN107562812B (en) | 2021-01-15 |
Family
ID=60975314
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710684763.6A Active CN107562812B (en) | 2017-08-11 | 2017-08-11 | Cross-modal similarity learning method based on specific modal semantic space modeling |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107562812B (en) |
Cited By (49)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108256631A (en) * | 2018-01-26 | 2018-07-06 | 深圳市唯特视科技有限公司 | A kind of user behavior commending system based on attention model |
CN108415819A (en) * | 2018-03-15 | 2018-08-17 | 中国人民解放军国防科技大学 | Hard disk fault tracking method and device |
CN108829719A (en) * | 2018-05-07 | 2018-11-16 | 中国科学院合肥物质科学研究院 | The non-true class quiz answers selection method of one kind and system |
CN108881950A (en) * | 2018-05-30 | 2018-11-23 | 北京奇艺世纪科技有限公司 | A kind of method and apparatus of video processing |
CN109255047A (en) * | 2018-07-18 | 2019-01-22 | 西安电子科技大学 | Based on the complementary semantic mutual search method of image-text being aligned and symmetrically retrieve |
CN109325240A (en) * | 2018-12-03 | 2019-02-12 | ***通信集团福建有限公司 | Method, apparatus, equipment and the medium of index inquiry |
CN109508400A (en) * | 2018-10-09 | 2019-03-22 | 中国科学院自动化研究所 | Picture and text abstraction generating method |
CN109543714A (en) * | 2018-10-16 | 2019-03-29 | 北京达佳互联信息技术有限公司 | Acquisition methods, device, electronic equipment and the storage medium of data characteristics |
CN109543009A (en) * | 2018-10-17 | 2019-03-29 | 龙马智芯(珠海横琴)科技有限公司 | Text similarity assessment system and text similarity appraisal procedure |
CN109670071A (en) * | 2018-10-22 | 2019-04-23 | 北京大学 | A kind of across the media Hash search methods and system of the guidance of serializing multiple features |
CN109785409A (en) * | 2018-12-29 | 2019-05-21 | 武汉大学 | A kind of image based on attention mechanism-text data fusion method and system |
CN109886326A (en) * | 2019-01-31 | 2019-06-14 | 深圳市商汤科技有限公司 | A kind of cross-module state information retrieval method, device and storage medium |
CN109902710A (en) * | 2019-01-07 | 2019-06-18 | 南京热信软件科技有限公司 | A kind of fast matching method and device of text image |
CN110210540A (en) * | 2019-05-22 | 2019-09-06 | 山东大学 | Across social media method for identifying ID and system based on attention mechanism |
CN110580489A (en) * | 2018-06-11 | 2019-12-17 | 阿里巴巴集团控股有限公司 | Data object classification system, method and equipment |
WO2020001048A1 (en) * | 2018-06-29 | 2020-01-02 | 北京大学深圳研究生院 | Double semantic space-based adversarial cross-media retrieval method |
CN110706771A (en) * | 2019-10-10 | 2020-01-17 | 复旦大学附属中山医院 | Method and device for generating multi-mode education content, server and storage medium |
CN110851641A (en) * | 2018-08-01 | 2020-02-28 | 杭州海康威视数字技术股份有限公司 | Cross-modal retrieval method and device and readable storage medium |
WO2020063524A1 (en) * | 2018-09-30 | 2020-04-02 | 北京国双科技有限公司 | Method and system for determining legal instrument |
CN110990597A (en) * | 2019-12-19 | 2020-04-10 | 中国电子科技集团公司信息科学研究院 | Cross-modal data retrieval system based on text semantic mapping and retrieval method thereof |
CN111026894A (en) * | 2019-12-12 | 2020-04-17 | 清华大学 | Cross-modal image text retrieval method based on credibility self-adaptive matching network |
JP2020064568A (en) * | 2018-10-19 | 2020-04-23 | 株式会社日立製作所 | Video analysis system, learning device, and method thereof |
CN111091010A (en) * | 2019-11-22 | 2020-05-01 | 京东方科技集团股份有限公司 | Similarity determination method, similarity determination device, network training device, network searching device and storage medium |
CN111159472A (en) * | 2018-11-08 | 2020-05-15 | 微软技术许可有限责任公司 | Multi-modal chat techniques |
CN111199750A (en) * | 2019-12-18 | 2020-05-26 | 北京葡萄智学科技有限公司 | Pronunciation evaluation method and device, electronic equipment and storage medium |
CN111274445A (en) * | 2020-01-20 | 2020-06-12 | 山东建筑大学 | Similar video content retrieval method and system based on triple deep learning |
CN111339256A (en) * | 2020-02-28 | 2020-06-26 | 支付宝(杭州)信息技术有限公司 | Method and device for text processing |
CN111429913A (en) * | 2020-03-26 | 2020-07-17 | 厦门快商通科技股份有限公司 | Digit string voice recognition method, identity verification device and computer readable storage medium |
CN111428072A (en) * | 2020-03-31 | 2020-07-17 | 南方科技大学 | Ophthalmologic multimodal image retrieval method, apparatus, server and storage medium |
WO2020155418A1 (en) * | 2019-01-31 | 2020-08-06 | 深圳市商汤科技有限公司 | Cross-modal information retrieval method and device, and storage medium |
CN111639240A (en) * | 2020-05-14 | 2020-09-08 | 山东大学 | Cross-modal Hash retrieval method and system based on attention awareness mechanism |
CN111930992A (en) * | 2020-08-14 | 2020-11-13 | 腾讯科技(深圳)有限公司 | Neural network training method and device and electronic equipment |
CN112001279A (en) * | 2020-08-12 | 2020-11-27 | 山东省人工智能研究院 | Cross-modal pedestrian re-identification method based on dual attribute information |
CN112041856A (en) * | 2018-03-01 | 2020-12-04 | 皇家飞利浦有限公司 | Cross-modal neural network for prediction |
CN112581387A (en) * | 2020-12-03 | 2021-03-30 | 广州电力通信网络有限公司 | Intelligent operation and maintenance system, device and method for power distribution room |
CN112668671A (en) * | 2021-03-15 | 2021-04-16 | 北京百度网讯科技有限公司 | Method and device for acquiring pre-training model |
CN113094550A (en) * | 2020-01-08 | 2021-07-09 | 百度在线网络技术(北京)有限公司 | Video retrieval method, device, equipment and medium |
CN113159371A (en) * | 2021-01-27 | 2021-07-23 | 南京航空航天大学 | Unknown target feature modeling and demand prediction method based on cross-modal data fusion |
CN113204666A (en) * | 2021-05-26 | 2021-08-03 | 杭州联汇科技股份有限公司 | Method for searching matched pictures based on characters |
CN113392196A (en) * | 2021-06-04 | 2021-09-14 | 北京师范大学 | Topic retrieval method and system based on multi-mode cross comparison |
CN113435206A (en) * | 2021-05-26 | 2021-09-24 | 卓尔智联(武汉)研究院有限公司 | Image-text retrieval method and device and electronic equipment |
CN113434716A (en) * | 2021-07-02 | 2021-09-24 | 泰康保险集团股份有限公司 | Cross-modal information retrieval method and device |
CN113934887A (en) * | 2021-12-20 | 2022-01-14 | 成都考拉悠然科技有限公司 | No-proposal time sequence language positioning method based on semantic decoupling |
CN113971209A (en) * | 2021-12-22 | 2022-01-25 | 松立控股集团股份有限公司 | Non-supervision cross-modal retrieval method based on attention mechanism enhancement |
CN114417878A (en) * | 2021-12-29 | 2022-04-29 | 北京百度网讯科技有限公司 | Semantic recognition method and device, electronic equipment and storage medium |
CN114529757A (en) * | 2022-01-21 | 2022-05-24 | 四川大学 | Cross-modal single-sample three-dimensional point cloud segmentation method |
CN115858839A (en) * | 2023-02-16 | 2023-03-28 | 上海蜜度信息技术有限公司 | Cross-modal LOGO retrieval method, system, terminal and storage medium |
CN116484878A (en) * | 2023-06-21 | 2023-07-25 | 国网智能电网研究院有限公司 | Semantic association method, device, equipment and storage medium of power heterogeneous data |
CN116522168A (en) * | 2023-07-04 | 2023-08-01 | 北京墨丘科技有限公司 | Cross-modal text similarity comparison method and device and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103559191A (en) * | 2013-09-10 | 2014-02-05 | 浙江大学 | Cross-media sorting method based on hidden space learning and two-way sorting learning |
US9280562B1 (en) * | 2006-01-31 | 2016-03-08 | The Research Foundation For The State University Of New York | System and method for multimedia ranking and multi-modal image retrieval using probabilistic semantic models and expectation-maximization (EM) learning |
CN105718532A (en) * | 2016-01-15 | 2016-06-29 | 北京大学 | Cross-media sequencing method based on multi-depth network structure |
CN106095829A (en) * | 2016-06-01 | 2016-11-09 | 华侨大学 | Cross-media retrieval method based on degree of depth study with the study of concordance expression of space |
-
2017
- 2017-08-11 CN CN201710684763.6A patent/CN107562812B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9280562B1 (en) * | 2006-01-31 | 2016-03-08 | The Research Foundation For The State University Of New York | System and method for multimedia ranking and multi-modal image retrieval using probabilistic semantic models and expectation-maximization (EM) learning |
CN103559191A (en) * | 2013-09-10 | 2014-02-05 | 浙江大学 | Cross-media sorting method based on hidden space learning and two-way sorting learning |
CN105718532A (en) * | 2016-01-15 | 2016-06-29 | 北京大学 | Cross-media sequencing method based on multi-depth network structure |
CN106095829A (en) * | 2016-06-01 | 2016-11-09 | 华侨大学 | Cross-media retrieval method based on degree of depth study with the study of concordance expression of space |
Non-Patent Citations (1)
Title |
---|
李志欣 等: "基于语义学习的图像多模态检索", 《计算机工程》 * |
Cited By (78)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108256631A (en) * | 2018-01-26 | 2018-07-06 | 深圳市唯特视科技有限公司 | A kind of user behavior commending system based on attention model |
CN112041856A (en) * | 2018-03-01 | 2020-12-04 | 皇家飞利浦有限公司 | Cross-modal neural network for prediction |
CN108415819A (en) * | 2018-03-15 | 2018-08-17 | 中国人民解放军国防科技大学 | Hard disk fault tracking method and device |
CN108415819B (en) * | 2018-03-15 | 2021-05-25 | 中国人民解放军国防科技大学 | Hard disk fault tracking method and device |
CN108829719A (en) * | 2018-05-07 | 2018-11-16 | 中国科学院合肥物质科学研究院 | The non-true class quiz answers selection method of one kind and system |
CN108881950A (en) * | 2018-05-30 | 2018-11-23 | 北京奇艺世纪科技有限公司 | A kind of method and apparatus of video processing |
CN110580489A (en) * | 2018-06-11 | 2019-12-17 | 阿里巴巴集团控股有限公司 | Data object classification system, method and equipment |
WO2020001048A1 (en) * | 2018-06-29 | 2020-01-02 | 北京大学深圳研究生院 | Double semantic space-based adversarial cross-media retrieval method |
CN109255047A (en) * | 2018-07-18 | 2019-01-22 | 西安电子科技大学 | Based on the complementary semantic mutual search method of image-text being aligned and symmetrically retrieve |
CN110851641A (en) * | 2018-08-01 | 2020-02-28 | 杭州海康威视数字技术股份有限公司 | Cross-modal retrieval method and device and readable storage medium |
WO2020063524A1 (en) * | 2018-09-30 | 2020-04-02 | 北京国双科技有限公司 | Method and system for determining legal instrument |
CN109508400A (en) * | 2018-10-09 | 2019-03-22 | 中国科学院自动化研究所 | Picture and text abstraction generating method |
CN109543714A (en) * | 2018-10-16 | 2019-03-29 | 北京达佳互联信息技术有限公司 | Acquisition methods, device, electronic equipment and the storage medium of data characteristics |
CN109543009B (en) * | 2018-10-17 | 2019-10-25 | 龙马智芯(珠海横琴)科技有限公司 | Text similarity assessment system and text similarity appraisal procedure |
CN109543009A (en) * | 2018-10-17 | 2019-03-29 | 龙马智芯(珠海横琴)科技有限公司 | Text similarity assessment system and text similarity appraisal procedure |
JP7171361B2 (en) | 2018-10-19 | 2022-11-15 | 株式会社日立製作所 | Data analysis system, learning device, and method thereof |
JP2020064568A (en) * | 2018-10-19 | 2020-04-23 | 株式会社日立製作所 | Video analysis system, learning device, and method thereof |
CN109670071B (en) * | 2018-10-22 | 2021-10-08 | 北京大学 | Serialized multi-feature guided cross-media Hash retrieval method and system |
CN109670071A (en) * | 2018-10-22 | 2019-04-23 | 北京大学 | A kind of across the media Hash search methods and system of the guidance of serializing multiple features |
US11921782B2 (en) | 2018-11-08 | 2024-03-05 | Microsoft Technology Licensing, Llc | VideoChat |
CN111159472B (en) * | 2018-11-08 | 2024-03-12 | 微软技术许可有限责任公司 | Multimodal chat technique |
CN111159472A (en) * | 2018-11-08 | 2020-05-15 | 微软技术许可有限责任公司 | Multi-modal chat techniques |
CN109325240A (en) * | 2018-12-03 | 2019-02-12 | ***通信集团福建有限公司 | Method, apparatus, equipment and the medium of index inquiry |
CN109785409A (en) * | 2018-12-29 | 2019-05-21 | 武汉大学 | A kind of image based on attention mechanism-text data fusion method and system |
CN109785409B (en) * | 2018-12-29 | 2020-09-08 | 武汉大学 | Image-text data fusion method and system based on attention mechanism |
CN109902710A (en) * | 2019-01-07 | 2019-06-18 | 南京热信软件科技有限公司 | A kind of fast matching method and device of text image |
CN109902710B (en) * | 2019-01-07 | 2023-07-11 | 李晓妮 | Quick matching method and device for text images |
WO2020155423A1 (en) * | 2019-01-31 | 2020-08-06 | 深圳市商汤科技有限公司 | Cross-modal information retrieval method and apparatus, and storage medium |
CN109886326A (en) * | 2019-01-31 | 2019-06-14 | 深圳市商汤科技有限公司 | A kind of cross-module state information retrieval method, device and storage medium |
WO2020155418A1 (en) * | 2019-01-31 | 2020-08-06 | 深圳市商汤科技有限公司 | Cross-modal information retrieval method and device, and storage medium |
CN109886326B (en) * | 2019-01-31 | 2022-01-04 | 深圳市商汤科技有限公司 | Cross-modal information retrieval method and device and storage medium |
JP2022510704A (en) * | 2019-01-31 | 2022-01-27 | シェンチェン センスタイム テクノロジー カンパニー リミテッド | Cross-modal information retrieval methods, devices and storage media |
TWI785301B (en) * | 2019-01-31 | 2022-12-01 | 大陸商深圳市商湯科技有限公司 | A cross-modal information retrieval method, device and storage medium |
CN110210540B (en) * | 2019-05-22 | 2021-02-26 | 山东大学 | Cross-social media user identity recognition method and system based on attention mechanism |
CN110210540A (en) * | 2019-05-22 | 2019-09-06 | 山东大学 | Across social media method for identifying ID and system based on attention mechanism |
CN110706771A (en) * | 2019-10-10 | 2020-01-17 | 复旦大学附属中山医院 | Method and device for generating multi-mode education content, server and storage medium |
CN111091010A (en) * | 2019-11-22 | 2020-05-01 | 京东方科技集团股份有限公司 | Similarity determination method, similarity determination device, network training device, network searching device and storage medium |
CN111026894A (en) * | 2019-12-12 | 2020-04-17 | 清华大学 | Cross-modal image text retrieval method based on credibility self-adaptive matching network |
CN111026894B (en) * | 2019-12-12 | 2021-11-26 | 清华大学 | Cross-modal image text retrieval method based on credibility self-adaptive matching network |
CN111199750A (en) * | 2019-12-18 | 2020-05-26 | 北京葡萄智学科技有限公司 | Pronunciation evaluation method and device, electronic equipment and storage medium |
CN111199750B (en) * | 2019-12-18 | 2022-10-28 | 北京葡萄智学科技有限公司 | Pronunciation evaluation method and device, electronic equipment and storage medium |
CN110990597A (en) * | 2019-12-19 | 2020-04-10 | 中国电子科技集团公司信息科学研究院 | Cross-modal data retrieval system based on text semantic mapping and retrieval method thereof |
CN110990597B (en) * | 2019-12-19 | 2022-11-25 | 中国电子科技集团公司信息科学研究院 | Cross-modal data retrieval system based on text semantic mapping and retrieval method thereof |
CN113094550A (en) * | 2020-01-08 | 2021-07-09 | 百度在线网络技术(北京)有限公司 | Video retrieval method, device, equipment and medium |
CN113094550B (en) * | 2020-01-08 | 2023-10-24 | 百度在线网络技术(北京)有限公司 | Video retrieval method, device, equipment and medium |
CN111274445B (en) * | 2020-01-20 | 2021-04-23 | 山东建筑大学 | Similar video content retrieval method and system based on triple deep learning |
CN111274445A (en) * | 2020-01-20 | 2020-06-12 | 山东建筑大学 | Similar video content retrieval method and system based on triple deep learning |
CN111339256A (en) * | 2020-02-28 | 2020-06-26 | 支付宝(杭州)信息技术有限公司 | Method and device for text processing |
CN111429913A (en) * | 2020-03-26 | 2020-07-17 | 厦门快商通科技股份有限公司 | Digit string voice recognition method, identity verification device and computer readable storage medium |
CN111428072A (en) * | 2020-03-31 | 2020-07-17 | 南方科技大学 | Ophthalmologic multimodal image retrieval method, apparatus, server and storage medium |
CN111639240A (en) * | 2020-05-14 | 2020-09-08 | 山东大学 | Cross-modal Hash retrieval method and system based on attention awareness mechanism |
CN112001279A (en) * | 2020-08-12 | 2020-11-27 | 山东省人工智能研究院 | Cross-modal pedestrian re-identification method based on dual attribute information |
CN111930992A (en) * | 2020-08-14 | 2020-11-13 | 腾讯科技(深圳)有限公司 | Neural network training method and device and electronic equipment |
CN111930992B (en) * | 2020-08-14 | 2022-10-28 | 腾讯科技(深圳)有限公司 | Neural network training method and device and electronic equipment |
CN112581387A (en) * | 2020-12-03 | 2021-03-30 | 广州电力通信网络有限公司 | Intelligent operation and maintenance system, device and method for power distribution room |
CN113159371A (en) * | 2021-01-27 | 2021-07-23 | 南京航空航天大学 | Unknown target feature modeling and demand prediction method based on cross-modal data fusion |
CN113159371B (en) * | 2021-01-27 | 2022-05-20 | 南京航空航天大学 | Unknown target feature modeling and demand prediction method based on cross-modal data fusion |
CN112668671A (en) * | 2021-03-15 | 2021-04-16 | 北京百度网讯科技有限公司 | Method and device for acquiring pre-training model |
CN112668671B (en) * | 2021-03-15 | 2021-12-24 | 北京百度网讯科技有限公司 | Method and device for acquiring pre-training model |
CN113435206B (en) * | 2021-05-26 | 2023-08-01 | 卓尔智联(武汉)研究院有限公司 | Image-text retrieval method and device and electronic equipment |
CN113204666B (en) * | 2021-05-26 | 2022-04-05 | 杭州联汇科技股份有限公司 | Method for searching matched pictures based on characters |
CN113435206A (en) * | 2021-05-26 | 2021-09-24 | 卓尔智联(武汉)研究院有限公司 | Image-text retrieval method and device and electronic equipment |
CN113204666A (en) * | 2021-05-26 | 2021-08-03 | 杭州联汇科技股份有限公司 | Method for searching matched pictures based on characters |
CN113392196A (en) * | 2021-06-04 | 2021-09-14 | 北京师范大学 | Topic retrieval method and system based on multi-mode cross comparison |
CN113434716A (en) * | 2021-07-02 | 2021-09-24 | 泰康保险集团股份有限公司 | Cross-modal information retrieval method and device |
CN113434716B (en) * | 2021-07-02 | 2024-01-26 | 泰康保险集团股份有限公司 | Cross-modal information retrieval method and device |
CN113934887B (en) * | 2021-12-20 | 2022-03-15 | 成都考拉悠然科技有限公司 | No-proposal time sequence language positioning method based on semantic decoupling |
CN113934887A (en) * | 2021-12-20 | 2022-01-14 | 成都考拉悠然科技有限公司 | No-proposal time sequence language positioning method based on semantic decoupling |
CN113971209B (en) * | 2021-12-22 | 2022-04-19 | 松立控股集团股份有限公司 | Non-supervision cross-modal retrieval method based on attention mechanism enhancement |
CN113971209A (en) * | 2021-12-22 | 2022-01-25 | 松立控股集团股份有限公司 | Non-supervision cross-modal retrieval method based on attention mechanism enhancement |
CN114417878B (en) * | 2021-12-29 | 2023-04-18 | 北京百度网讯科技有限公司 | Semantic recognition method and device, electronic equipment and storage medium |
CN114417878A (en) * | 2021-12-29 | 2022-04-29 | 北京百度网讯科技有限公司 | Semantic recognition method and device, electronic equipment and storage medium |
CN114529757A (en) * | 2022-01-21 | 2022-05-24 | 四川大学 | Cross-modal single-sample three-dimensional point cloud segmentation method |
CN114529757B (en) * | 2022-01-21 | 2023-04-18 | 四川大学 | Cross-modal single-sample three-dimensional point cloud segmentation method |
CN115858839A (en) * | 2023-02-16 | 2023-03-28 | 上海蜜度信息技术有限公司 | Cross-modal LOGO retrieval method, system, terminal and storage medium |
CN116484878B (en) * | 2023-06-21 | 2023-09-08 | 国网智能电网研究院有限公司 | Semantic association method, device, equipment and storage medium of power heterogeneous data |
CN116484878A (en) * | 2023-06-21 | 2023-07-25 | 国网智能电网研究院有限公司 | Semantic association method, device, equipment and storage medium of power heterogeneous data |
CN116522168A (en) * | 2023-07-04 | 2023-08-01 | 北京墨丘科技有限公司 | Cross-modal text similarity comparison method and device and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN107562812B (en) | 2021-01-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107562812A (en) | A kind of cross-module state similarity-based learning method based on the modeling of modality-specific semantic space | |
CN106295796B (en) | entity link method based on deep learning | |
CN111061856B (en) | Knowledge perception-based news recommendation method | |
CN110363282B (en) | Network node label active learning method and system based on graph convolution network | |
CN111414461B (en) | Intelligent question-answering method and system fusing knowledge base and user modeling | |
CN109299341A (en) | One kind confrontation cross-module state search method dictionary-based learning and system | |
Wu et al. | Dynamic graph convolutional network for multi-video summarization | |
CN107346328A (en) | A kind of cross-module state association learning method based on more granularity hierarchical networks | |
CN109934261A (en) | A kind of Knowledge driving parameter transformation model and its few sample learning method | |
CN113140254B (en) | Meta-learning drug-target interaction prediction system and prediction method | |
CN113505204B (en) | Recall model training method, search recall device and computer equipment | |
CN112988917B (en) | Entity alignment method based on multiple entity contexts | |
CN111931505A (en) | Cross-language entity alignment method based on subgraph embedding | |
CN110287323A (en) | A kind of object-oriented sensibility classification method | |
CN111476261A (en) | Community-enhanced graph convolution neural network method | |
CN105701225B (en) | A kind of cross-media retrieval method based on unified association hypergraph specification | |
Chen et al. | Binarized neural architecture search for efficient object recognition | |
CN112733602B (en) | Relation-guided pedestrian attribute identification method | |
CN115952292B (en) | Multi-label classification method, apparatus and computer readable medium | |
Moyano | Learning network representations | |
CN108875034A (en) | A kind of Chinese Text Categorization based on stratification shot and long term memory network | |
CN111222847A (en) | Open-source community developer recommendation method based on deep learning and unsupervised clustering | |
Qi et al. | Patent analytic citation-based vsm: Challenges and applications | |
CN115858919A (en) | Learning resource recommendation method and system based on project field knowledge and user comments | |
CN115687760A (en) | User learning interest label prediction method based on graph neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |