CN109284414A - The cross-module state content search method and system kept based on semanteme - Google Patents
The cross-module state content search method and system kept based on semanteme Download PDFInfo
- Publication number
- CN109284414A CN109284414A CN201811156579.5A CN201811156579A CN109284414A CN 109284414 A CN109284414 A CN 109284414A CN 201811156579 A CN201811156579 A CN 201811156579A CN 109284414 A CN109284414 A CN 109284414A
- Authority
- CN
- China
- Prior art keywords
- sample
- mode
- node
- mapping function
- mode sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of cross-module state content search methods kept based on semanteme, comprising: constructs fisrt feature figure and second feature figure respectively by node of the feature vector of first mode sample and second mode sample;The label vector for extracting all samples is that node constructs grapheme;Obtain the neighbor node of each node;The first mapping function and the second mapping function for first mode sample and second mode sample to be mapped as to implicit expression are constructed respectively;Mapping function is learnt, approximation maximizes the likelihood that the neighbor node of each node occurs, and each implicit expression is allowed to rebuild the corresponding label information of corresponding node;With the first mapping function by sample retrieval be mapped as retrieving it is implicit indicate, and each second mode sample is mapped as to the second mapping function target is implicit to be indicated;Acquisition retrieval is implicit to be indicated at a distance from the implicit expression of each target, using the corresponding second mode sample of all distances less than retrieval threshold as search result.
Description
Technical field
The present invention relates to the cross-module state retrieval techniques of MultiMedia Field, in particular to cross-module state content retrieval technology.
Background technique
With the development of multimedia technology, the data of various mode are widely present in internet.The retrieval of cross-module state is more
One of important subject of field of media.Traditional single mode searching system, query sample and search result are confined to single
Mode is not able to satisfy the growing demand of user.Cross-module state searching system is then different from single mode searching system, inquires sample
This and search result are belonging respectively to different mode, such as image, video, audio data sample is used to retrieve as query sample
Content of text.Cross-module state retrieval technique provides more convenient and fast retrieval mode for user, and user is facilitated to obtain a variety of moulds needed
The information of state, improves user experience.Because query sample and search result are belonging respectively to different mode, how from semantically
The similarity for comparing different modalities sample is good problem to study.
Since different mode has heterogeneity, the key of cross-module state retrieval is how to be associated with different modalities.Currently, absolutely
The sample of different modalities is mapped to low-dimensional and implied in space by most of cross-module state searching algorithms.According to the implicit expression learnt
Classification, can be divided into real number representation cross-module state search method and binary representation cross-module state search method.According to these method institutes
The information classification used, can be divided into non-supervisory method and have measure of supervision.Non-supervisory method is total using only different modalities sample
Existing information, the label information for having measure of supervision that sample has been used to have.In general, the information used is more, the cross-module state
The effect of searching algorithm is better.
The high-layer semantic information that label information can be used as instructs the foundation of relationship between different modalities sample, although different moulds
The sample of state has different feature spaces, but they have identical Label space.In the existing method, label information is used as
In addition a mode, perhaps for calculating similarity associated images text pair or as the expression in implicit space.It is existing
Method, using relatively simple, between consideration mode association, does not account for the association in mode, but mode for label information
Interior related information is vital.In same mode, the sample with similar semantic is implicit to indicate similar, in addition mode
Between the sample with similar semantic is implicit indicates similar, it is similar that the similitude of consistency ensure that the similar sample of all semantemes has
Implicit expression.It is contemplated that one grapheme comprising all samples of creation provides high-level semantic constraint, in addition two include
The characteristic pattern of respective mode sample provides manifold constraint, and rebuilds label information and provide global semantic constraint.In addition, traditional base
Need to create complexity O (M when node quantity is M in the method for figure2) figure, solve and need complicated Eigenvalues Decomposition mistake
Journey needs highly efficient algorithm to the study of graph structure.
Summary of the invention
In view of the above-mentioned problems, the invention discloses a kind of cross-module state content search method kept based on semanteme and system,
Include: that retrieved set is constructed with first mode sample, object set is constructed with second mode sample;Extract the spy of the first mode sample
Levying vector is that node constructs fisrt feature figure;The feature vector for extracting the second mode sample is that node constructs second feature figure;
Extracting the retrieved set and neutralizing this target tightening the label vector of the label information of all samples is that node constructs grapheme;It obtains every
The neighbor node of a node;The first mode sample for being mapped as the first mapping function of implicit expression, Yi Jiyong by building
In the second mapping function that the second mode sample is mapped as to implicit expression;To first mapping function and the second mapping letter
Number is learnt, and approximation maximizes the likelihood that the neighbor node of each node occurs, and each implicit expression is weighed
Build the corresponding label information of corresponding node;It, should by the first mapping function using a certain first mode sample as sample retrieval
Sample retrieval is mapped as retrieving implicit expression, and each second mode sample is mapped as target by the second mapping function and is implied
It indicates;The implicit expression of the retrieval is obtained at a distance from the implicit expression of each target, with all distances less than retrieval threshold
The corresponding second mode sample is the search result of the sample retrieval.
Cross-module state content search method of the present invention, wherein sampling is sampled and born using neighbours to the first mapping letter
Several and second mapping function is learnt, and establishes multinomial point according to the weight on sampling nodes to the side of node adjacent thereto
Cloth, from the multinomial distribution sampling with the sampling nodes have connection node be neighbor node, and with by be uniformly distributed selection with
The connectionless node of the sampling nodes is negative nodal point.
Cross-module state content search method of the present invention, wherein the distance, which is that the retrieval is implicit, indicates implicit with the target
Euclidean distance d (x between expressioni,xj)=(xi-xj)2Or COS distanceWherein, xiFor this
Retrieval is implicit to be indicated, xjIt is indicated for the target is implicit.
Cross-module state content search method of the present invention, wherein the mode of the first mode sample include visual modalities,
Audio modality, text modality, the mode of the second mode sample include visual modalities, audio modality, text modality.
Cross-module state content search method of the present invention, wherein if the first mode sample and/or the second mode sample
This mode is visual modalities, then the feature vector of the first mode sample and/or the second mode sample is that Scale invariant is special
Levy transform characteristics or visual modalities convolutional neural networks feature or histograms of oriented gradients feature;If the first mode sample
And/or the mode of the second mode sample is text modality, the then feature of the first mode sample and/or the second mode sample
Vector is word frequency-inverse file frequecy characteristic or text modality depth convolution/recurrent neural network feature.
The invention also discloses a kind of cross-module state content retrieval systems kept based on semanteme, comprising:
Sample set constructs module, for constructing retrieved set with first mode sample, and constructs target with second mode sample
Collection;
Characteristic pattern constructs module, for constructing fisrt feature figure and second feature figure and grapheme, and obtains first spy
The neighbor node of each node in sign figure and the second feature figure;The feature vector for wherein extracting the first mode sample is node
The fisrt feature figure is constructed, the feature vector for extracting the second mode sample is that node constructs the second feature figure, extracts the inspection
Rope, which integrates, to be neutralized this target tightening the label vector of the label information of all samples and construct the grapheme as node;Obtain each node
Neighbor node;
Mapping function study module, for constructing mapping function and learning to the mapping function;Wherein building is used for
The first mode sample is mapped as to the first mapping function of implicit expression, and hidden for the second mode sample to be mapped as
The second mapping function containing expression;First mapping function and second mapping function are learnt, approximation maximizes each
The likelihood that the neighbor node of the node occurs, and each implicit expression is allowed to rebuild the corresponding label information of corresponding node;
Sample searching module, for obtaining search result;It is wherein sample retrieval by a certain first mode sample, leads to
Cross first mapping function and the sample retrieval be mapped as retrieving implicit expression, and with second mapping function will it is each this second
Mode sample is mapped as the implicit expression of target;The implicit expression of the retrieval is obtained at a distance from the implicit expression of each target, with institute
Have less than retrieval threshold this apart from the corresponding second mode sample be the sample retrieval search result.
Cross-module state content retrieval system of the present invention, wherein the mapping function study module includes:
Neighbours' sampling module, for being sampled using neighbours to first mapping function and second mapping function
It practises;Multinomial distribution is wherein established according to the weight on sampling nodes to the side of node adjacent thereto, is adopted from the multinomial distribution
The node that sample and the sampling nodes have connection is neighbor node;
Negative sampling module learns first mapping function and second mapping function using negative sampling;Wherein from
Multinomial distribution sampling is negative nodal point with the connectionless node of the sampling nodes.
Cross-module state content retrieval system of the present invention, wherein the retrieval and result obtain in module, which is
The implicit Euclidean distance d (x indicated between the implicit expression of the target of the retrievali,xj)=(xi-xj)2Or COS distanceWherein, xiIt is indicated for the retrieval is implicit, xjIt is indicated for the target is implicit.
Cross-module state content retrieval system of the present invention, wherein the mode of the first mode sample include visual modalities,
Audio modality, text modality, the mode of the second mode sample include visual modalities, audio modality, text modality.
Cross-module state content retrieval system of the present invention, wherein if the first mode sample and/or the second mode sample
This mode is visual modalities, then the feature vector of the first mode sample and/or the second mode sample is that Scale invariant is special
Levy transform characteristics or visual modalities convolutional neural networks feature or histograms of oriented gradients feature;If the first mode sample
And/or the mode of the second mode sample is text modality, the then feature of the first mode sample and/or the second mode sample
Vector is word frequency-inverse file frequecy characteristic or text modality depth convolution/recurrent neural network feature.
Detailed description of the invention
Fig. 1 is the cross-module state content search method flow chart of the embodiment of the present invention kept based on semanteme.
Fig. 2 is the cross-module state content search method characteristic pattern of the embodiment of the present invention kept based on semanteme and showing for grapheme
It is intended to.
Fig. 3 is the mapping function schematic diagram for the cross-module state content search method of the embodiment of the present invention kept based on semanteme.
Fig. 4 is the cross-module state content retrieval system schematic diagram of the embodiment of the present invention kept based on semanteme.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing, the present invention is mentioned
A kind of cross-module state content search method and system based on semanteme holding out is further described.It should be appreciated that this place
The specific implementation method of description is only used to explain the present invention, is not intended to limit the present invention.
The invention proposes a kind of cross-module state search methods kept based on semanteme, are related to multiple modalities.For convenience of description,
The embodiment of the present invention only relates to two mode of text and image, it is to be understood that, cross-module state content inspection according to the present invention
Suo Fangfa is widely portable to the mode such as text, vision, the sense of hearing and such as video is multi-modal, and is not limited to above-mentioned mode.
Cross-module state search method according to the present invention is roughly divided into three steps, and original is extracted by the way of feature extraction to each sample first
Then beginning feature learns mapping function for each sample and is mapped to implicit expression from primitive character, it is hidden finally to calculate sample retrieval
Containing indicating that sample implies the distance indicated with target tightening, by distance-taxis, selection is with sample retrieval apart from the mesh less than threshold value
Mark collection sample is as search result.
Fig. 1 is the cross-module state content search method flow chart of the embodiment of the present invention kept based on semanteme.As shown in Figure 1,
In an embodiment of the present invention, the cross-module state search method kept based on semanteme is specifically included:
Step S1 is constructed with retrieved set and object set, wherein the sample standard deviation of retrieved set has first mode, referred to as the
One mode sample, the sample of object set then all have second mode, referred to as second mode sample, first mode sample and second
The mode of mode sample includes visual modalities, audio modality, text modality etc., or including visual modalities and audio modality
Multi-modal, such as video modality etc., the present invention is not limited thereto;First mode sample and second mode sample have difference
Mode, in an embodiment of the present invention, first mode is image modalities, and second mode is text modality;
Step S2, the feature vector for extracting all first mode samples is that node constructs fisrt feature figure;Extract all
The feature vector of two mode samples is that node constructs second feature figure;Extract all first mode samples and second mode sample language
The label information of adopted label is label vector, constructs grapheme by node of each label vector;In an embodiment of the present invention,
When first mode sample is image pattern, and second mode sample is samples of text, image pattern and samples of text are extracted first
Feature vector;Wherein the feature vector of image pattern can choose such as SIFT (Scale invariant features transform Scale-
Invariant feature transform) feature or visual modalities CNN (convolutional neural networks Convolution
Neural Network) feature or HOG (histograms of oriented gradients Histogram ofOriented Gradient) feature
Deng the feature vector of samples of text can use TF-IDF (word frequency-inverse file frequency term frequency-inverse
Document frequency) feature or text modality CNN (convolutional neural networks Convolution Neural
Network) CNN/RNN (the depth convolution/recurrent neural network RecurrentNeural of feature or text modality
Network) feature, the present invention is not limited thereto;
Fig. 2 is the cross-module state content search method characteristic pattern of the embodiment of the present invention kept based on semanteme and showing for grapheme
It is intended to.As shown in Fig. 2, establishing three figures respectively using first mode sample and second mode sample, comprising: grapheme Gs, the
One characteristic pattern (characteristics of image figure Gt), second feature figure (text feature figure Gi), all languages by samples of text and image pattern
The adopted extracted label vector of label is all a node in grapheme Gs;
Step S3 is schemed by three, obtains the neighbor node of each node;Because grapheme Gs contain samples of text and
The semantic label of image pattern, so the semantic information between containing mode and in mode;Wherein three figures refer to grapheme,
One characteristic pattern and second feature figure;It is established using image pattern and the label information of samples of text as label vector each in grapheme
The connection of node, is divided into two methods:
First method is, the label vector and if only if two nodes in grapheme has the value of at least one identical dimensional
All to be non-zero, then a line is established among the two nodes, vector similarity is calculated as side between node according to label vector
Weight, cosine similarity can be usedOr use index similarityHere zi、zjIt is the label vector of node i, j respectively, σ is spread factor;
Second method establishes the connection of each node of grapheme using existing knowledge mapping, for example, find image pattern and
The label of samples of text corresponding concept in word net (WordNet) uses entity in the calculation knowledges map such as such as shortest path
Similarity, the weight as side between node in grapheme;The case where for multi-tag, needs to the phase between all labels
It is averaged like degree, the weight as side in grapheme Gs;In fisrt feature figure (characteristics of image figure), arbitrary two are tied
Point calculates distance with the feature vector of image, if a node is k neighbour's node of another node, the two nodes
Between have a connection, and the weight on side is 1;In second feature figure (text feature figure), for arbitrary two nodes, text is used
Feature vector calculate distance and if a node is k neighbour's node of another node have company between the two nodes
It connects, and the weight on side is 1;
Step S4 constructs the first mapping function for first mode sample to be mapped as to implicit expression, and constructs and be used for
Second mode sample is mapped as to the second mapping function of implicit expression;To the first mapping function and the second mapping function
It practises, approximation maximizes the likelihood that the neighbor node of each node occurs, and each implicit expression is allowed to rebuild corresponding node
Corresponding label information;For image pattern vi, implicit to be expressed as fv(vi), for samples of text ti, implicit to be expressed as ft(ti),
F (n is collectively expressed as to both implicit expressionsi), niFor image pattern or samples of text;In order to keep grapheme, the first spy
The partial structurtes of sign figure and second feature figure, the neighbor node that the present invention maximizes each node to each figure respectively occur general
Rate;
For node niOne neighbours sample set P (n of samplingi), maximize probabilityHere V is that own in grapheme, fisrt feature figure and second feature figure
The set of node, P (ni) indicate node niThe corresponding sample of neighbor node, that is, neighbours' sample, T indicate vector transposition;
When node is large number of, negative sample is sampled by aforementioned probability P r (P (ni)|ni) relax to minimize lossN(ni) indicate node ni
Negative sample;Neighbours' sample is sampled by neighbours and is obtained, i.e., according to each neighbor node to node niSide weight establish it is multinomial
Formula distribution samples neighbor node from the multinomial distribution;Negative sampling obtains negative sample, i.e., then by being uniformly distributed selection and niWithout even
The node connect negative sample the most;In three figures, partial structurtes are all guaranteed using similar neighbours' sampling and negative sample, in above formula
G can be grapheme Gs, text feature figure GiOr characteristics of image figure Gt, i.e., to image pattern viIt is availableTo samples of text tiIt is available
In addition, introducing global semantic holding condition, that is, the implicit expression needs mapped can recover semantic label letter
Breath;Enabling g () is from the implicit function indicated to semantic label, the global semantic loss kept are as follows:
WhereinIt is node niSemantic label;
In general, for image pattern vi, the loss of optimization are as follows:Wherein α and β are
Coefficient of balance;Similar, for samples of text ti, the loss of optimization are as follows:
In order to model the non-linear relation between primitive character and implicit expression, the present invention uses the structure of neural network.
Fig. 3 is the mapping function schematic diagram for the cross-module state content search method of the embodiment of the present invention kept based on semanteme.Such as Fig. 3 institute
Show, fv(·)、ftText and image are mapped to unified implicit representation space by (), are mapped to semantic label by g () later
Space, for different concrete application situations, the form of network can be different, such as fv(·)、ftThe number of plies of (), g () can
To increase or reduce;Finally, optimizing loss function, study mapping using stochastic gradient descent method and error backpropagation algorithm
Function;
Step S5 finds out the implicit expression of each sample according to the mapping function learnt;Some given is located at
First mode sample (sample retrieval) in retrieved set, calculating it implicit indicates hidden with target tightening each second mode sample
Euclidean distance d (x can be used in distance containing expression, distance described herei,xj)=(xi-xj)2, also can be used cosine away from
FromThe present invention is not limited thereto, wherein xi, xjRespectively indicate the implicit of first mode sample
Indicate the implicit expression with second mode sample;It is ranked up the distance of all acquisitions is ascending, according to preset
Retrieval threshold N selects search result of the second mode sample of preceding N in distance sequence as sample retrieval.
The invention also discloses a kind of cross-module state content retrieval systems kept based on semanteme.Fig. 4 is the embodiment of the present invention
Based on semanteme keep cross-module state content retrieval system schematic diagram.As shown in figure 4, cross-module state content retrieval system of the invention
It include: sample set building module, characteristic pattern building module, mapping function study module and sample searching module, wherein sample set
Module is constructed for constructing with retrieved set and object set, the sample standard deviation of retrieved set kind has first mode, referred to as first mode
Sample, the sample that target tightening then all have second mode, referred to as second mode sample;Characteristic pattern constructs module, for mentioning
The feature vector for taking all first mode samples is that node constructs fisrt feature figure, extract the features of all second mode samples to
Amount is that node constructs second feature figure, and extracts the label of the label information of all first mode samples and second mode sample
Vector is that node constructs grapheme, and obtains the neighbor node of each node;Mapping function study module is reflected for constructing first
Function and the second mapping function are penetrated, wherein the first mapping function is used to for first mode sample to be mapped as implicit expression, second reflects
Function is penetrated for second mode sample to be mapped as implicit expression, by the first mapping function and the second mapping function
It practises, approximation maximizes the likelihood that the neighbor node of each node occurs, and each implicit expression is allowed to rebuild corresponding node
Corresponding label information;Sample searching module, for obtaining search result, wherein using certain first mode sample as sample retrieval,
Sample retrieval is mapped as by the first mapping function to retrieve implicit expression, and every by what target tightening by the second mapping function
A second mode sample is mapped as the implicit expression of target, and acquisition retrieval is implicit to be indicated to imply at a distance from expression with each target, will
The distance of all acquisitions is ascending to be ranked up, and according to preset retrieval threshold N, selects the of preceding N in distance sequence
Search result of the two mode samples as sample retrieval.
Claims (10)
1. a kind of cross-module state content search method kept based on semanteme characterized by comprising
Retrieved set is constructed with first mode sample, object set is constructed with second mode sample;
The feature vector for extracting the first mode sample is that node constructs fisrt feature figure;Extract the feature of the second mode sample
Vector is that node constructs second feature figure;Extract the retrieved set neutralize this target tightening all samples label information label to
Amount is that node constitutes grapheme;Obtain the neighbor node of each node;
Building implies the first mapping function of expression for the first mode sample to be mapped as, and is used for the second mode
Sample is mapped as the second mapping function of implicit expression;First mapping function and second mapping function are learnt, closely
Like the likelihood that the neighbor node for maximizing each node occurs, and each implicit expression is allowed to rebuild the correspondence of corresponding node
Label information;
Using a certain first mode sample as sample retrieval, which is mapped as by retrieval by the first mapping function and is implied
It indicates, each second mode sample is mapped as by the implicit expression of target by the second mapping function;It obtains the retrieval and implies table
Show with each target is implicit indicate at a distance from, be apart from the corresponding second mode sample with these all less than retrieval threshold
The search result of the sample retrieval.
2. cross-module state content search method as described in claim 1, which is characterized in that sampled using neighbours and negative sampling is to this
First mapping function and second mapping function are learnt, and are built according to the weight on sampling nodes to the side of node adjacent thereto
Vertical multinomial distribution, the node for having connection with the sampling nodes from multinomial distribution sampling is neighbor node, and by uniform
Distribution selects with the connectionless node of the sampling nodes to be negative nodal point.
3. cross-module state content search method as described in claim 1, which is characterized in that the distance be the retrieval it is implicit indicate with
Euclidean distance d (x between the implicit expression of the targeti,xj)=(xi-xj)2Or COS distance
Wherein, xiIt is indicated for the retrieval is implicit, xjIt is indicated for the target is implicit.
4. cross-module state content search method as described in claim 1, which is characterized in that the mode of the first mode sample includes
Visual modalities, audio modality, text modality, the mode of the second mode sample include visual modalities, audio modality, text mould
State.
5. cross-module state content search method as claimed in claim 4, which is characterized in that if the first mode sample and/or should
The mode of second mode sample is visual modalities, then the feature vector of the first mode sample and/or the second mode sample is
Scale invariant features transform feature or visual modalities convolutional neural networks feature or histograms of oriented gradients feature;If this first
The mode of mode sample and/or the second mode sample is text modality, then the first mode sample and/or the second mode sample
This feature vector is word frequency-inverse file frequecy characteristic or text modality depth convolution/recurrent neural network feature.
6. a kind of cross-module state content retrieval system kept based on semanteme characterized by comprising
Sample set constructs module, for constructing retrieved set with first mode sample, and constructs object set with second mode sample;
Characteristic pattern constructs module, for constructing fisrt feature figure and second feature figure and grapheme, and obtains the fisrt feature figure
With the neighbor node of each node in the second feature figure;Extract the feature vector of the first mode sample wherein as node building
The fisrt feature figure, the feature vector for extracting the second mode sample is that node constructs the second feature figure, extracts the retrieved set
The label vector for neutralizing this target tightening the label information of all samples is that node constructs the grapheme;Obtain the neighbour of each node
Occupy node;
Mapping function study module, for constructing mapping function and learning to the mapping function;Wherein building was for should
First mode sample is mapped as the first mapping function of implicit expression, and for the second mode sample to be mapped as implicit table
The second mapping function shown;First mapping function and second mapping function are learnt, approximation maximizes each node
The likelihood that occurs of neighbor node, and each implicit expression is allowed to rebuild the corresponding label information of corresponding node;
Sample searching module, for obtaining search result;It is wherein sample retrieval by a certain first mode sample, by this
The sample retrieval is mapped as retrieving implicit expression by the first mapping function, and will each second mode with second mapping function
Sample is mapped as the implicit expression of target;The implicit expression of the retrieval is obtained at a distance from the implicit expression of each target, with all small
In retrieval threshold this apart from the corresponding second mode sample be the sample retrieval search result.
7. cross-module state content retrieval system as claimed in claim 6, which is characterized in that the mapping function study module packet
It includes:
Neighbours' sampling module, for being learnt using neighbours' sampling to first mapping function and second mapping function;Its
The weight on the middle side according to sampling nodes to node adjacent thereto establishes multinomial distribution, samples and is somebody's turn to do from the multinomial distribution
The node that sampling nodes have connection is neighbor node;
Negative sampling module carries out approximate study to first mapping function and second mapping function using negative sampling;Wherein from
Multinomial distribution sampling is negative nodal point with the connectionless node of the sampling nodes.
8. cross-module state content retrieval system as claimed in claim 6, which is characterized in that the retrieval and result obtain module
In, which is the implicit Euclidean distance d (x indicated between the implicit expression of the target of the retrievali,xj)=(xi-xj)2Or cosine
DistanceWherein, xiIt is indicated for the retrieval is implicit, xjIt is indicated for the target is implicit.
9. cross-module state content retrieval system as claimed in claim 6, which is characterized in that the mode of the first mode sample includes
Visual modalities, audio modality, text modality, the mode of the second mode sample include visual modalities, audio modality, text mould
State.
10. cross-module state content retrieval system as claimed in claim 9, which is characterized in that if the first mode sample and/or should
The mode of second mode sample is visual modalities, then the feature vector of the first mode sample and/or the second mode sample is
Scale invariant features transform feature or visual modalities convolutional neural networks feature or histograms of oriented gradients feature;If this first
The mode of mode sample and/or the second mode sample is text modality, then the first mode sample and/or the second mode sample
This feature vector is word frequency-inverse file frequecy characteristic or text modality depth convolution/recurrent neural network feature.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811156579.5A CN109284414B (en) | 2018-09-30 | 2018-09-30 | Cross-modal content retrieval method and system based on semantic preservation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811156579.5A CN109284414B (en) | 2018-09-30 | 2018-09-30 | Cross-modal content retrieval method and system based on semantic preservation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109284414A true CN109284414A (en) | 2019-01-29 |
CN109284414B CN109284414B (en) | 2020-12-04 |
Family
ID=65182054
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811156579.5A Active CN109284414B (en) | 2018-09-30 | 2018-09-30 | Cross-modal content retrieval method and system based on semantic preservation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109284414B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109886326A (en) * | 2019-01-31 | 2019-06-14 | 深圳市商汤科技有限公司 | A kind of cross-module state information retrieval method, device and storage medium |
CN110222560A (en) * | 2019-04-25 | 2019-09-10 | 西北大学 | A kind of text people search's method being embedded in similitude loss function |
CN111813967A (en) * | 2020-07-14 | 2020-10-23 | 中国科学技术信息研究所 | Retrieval method, retrieval device, computer equipment and storage medium |
CN112100410A (en) * | 2020-08-13 | 2020-12-18 | 中国科学院计算技术研究所 | Cross-modal retrieval method and system based on semantic condition association learning |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7089543B2 (en) * | 2001-07-13 | 2006-08-08 | Sony Corporation | Use of formal logic specification in construction of semantic descriptions |
WO2010120941A3 (en) * | 2009-04-15 | 2011-01-20 | Evri Inc. | Automatic mapping of a location identifier pattern of an object to a semantic type using object metadata |
CN103049526A (en) * | 2012-12-20 | 2013-04-17 | 中国科学院自动化研究所 | Cross-media retrieval method based on double space learning |
CN105205096A (en) * | 2015-08-18 | 2015-12-30 | 天津中科智能识别产业技术研究院有限公司 | Text modal and image modal crossing type data retrieval method |
US20170116521A1 (en) * | 2015-10-27 | 2017-04-27 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Tag processing method and device |
CN107273517A (en) * | 2017-06-21 | 2017-10-20 | 复旦大学 | Picture and text cross-module state search method based on the embedded study of figure |
CN107330100A (en) * | 2017-07-06 | 2017-11-07 | 北京大学深圳研究生院 | Combine the two-way search method of image text of embedded space based on multi views |
CN107633263A (en) * | 2017-08-30 | 2018-01-26 | 清华大学 | Network embedding grammar based on side |
-
2018
- 2018-09-30 CN CN201811156579.5A patent/CN109284414B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7089543B2 (en) * | 2001-07-13 | 2006-08-08 | Sony Corporation | Use of formal logic specification in construction of semantic descriptions |
WO2010120941A3 (en) * | 2009-04-15 | 2011-01-20 | Evri Inc. | Automatic mapping of a location identifier pattern of an object to a semantic type using object metadata |
CN103049526A (en) * | 2012-12-20 | 2013-04-17 | 中国科学院自动化研究所 | Cross-media retrieval method based on double space learning |
CN105205096A (en) * | 2015-08-18 | 2015-12-30 | 天津中科智能识别产业技术研究院有限公司 | Text modal and image modal crossing type data retrieval method |
US20170116521A1 (en) * | 2015-10-27 | 2017-04-27 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Tag processing method and device |
CN107273517A (en) * | 2017-06-21 | 2017-10-20 | 复旦大学 | Picture and text cross-module state search method based on the embedded study of figure |
CN107330100A (en) * | 2017-07-06 | 2017-11-07 | 北京大学深圳研究生院 | Combine the two-way search method of image text of embedded space based on multi views |
CN107633263A (en) * | 2017-08-30 | 2018-01-26 | 清华大学 | Network embedding grammar based on side |
Non-Patent Citations (2)
Title |
---|
YILING,WU等: ""Online Asymmetric Similarity Learning for Cross-Modal Retrieval"", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 * |
王树徽等: ""异质媒体分析技术研究进展"", 《集成技术》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109886326A (en) * | 2019-01-31 | 2019-06-14 | 深圳市商汤科技有限公司 | A kind of cross-module state information retrieval method, device and storage medium |
CN109886326B (en) * | 2019-01-31 | 2022-01-04 | 深圳市商汤科技有限公司 | Cross-modal information retrieval method and device and storage medium |
CN110222560A (en) * | 2019-04-25 | 2019-09-10 | 西北大学 | A kind of text people search's method being embedded in similitude loss function |
CN110222560B (en) * | 2019-04-25 | 2022-12-23 | 西北大学 | Text person searching method embedded with similarity loss function |
CN111813967A (en) * | 2020-07-14 | 2020-10-23 | 中国科学技术信息研究所 | Retrieval method, retrieval device, computer equipment and storage medium |
CN111813967B (en) * | 2020-07-14 | 2024-01-30 | 中国科学技术信息研究所 | Retrieval method, retrieval device, computer equipment and storage medium |
CN112100410A (en) * | 2020-08-13 | 2020-12-18 | 中国科学院计算技术研究所 | Cross-modal retrieval method and system based on semantic condition association learning |
Also Published As
Publication number | Publication date |
---|---|
CN109284414B (en) | 2020-12-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108132968B (en) | Weak supervision learning method for associated semantic elements in web texts and images | |
Cheng et al. | A survey and analysis on automatic image annotation | |
Wang et al. | Self-constraining and attention-based hashing network for bit-scalable cross-modal retrieval | |
CN110059198A (en) | A kind of discrete Hash search method across modal data kept based on similitude | |
Jiao et al. | SAR images retrieval based on semantic classification and region-based similarity measure for earth observation | |
CN112819023B (en) | Sample set acquisition method, device, computer equipment and storage medium | |
CN109284414A (en) | The cross-module state content search method and system kept based on semanteme | |
Qi et al. | Two-dimensional multilabel active learning with an efficient online adaptation model for image classification | |
CN110647904B (en) | Cross-modal retrieval method and system based on unmarked data migration | |
Wang et al. | Semantic gap in cbir: Automatic objects spatial relationships semantic extraction and representation | |
JP5626042B2 (en) | Retrieval system, method and program for representative image in image set | |
Qian et al. | Landmark summarization with diverse viewpoints | |
CN111985520A (en) | Multi-mode classification method based on graph convolution neural network | |
CN114461839B (en) | Multi-mode pre-training-based similar picture retrieval method and device and electronic equipment | |
CN113515669A (en) | Data processing method based on artificial intelligence and related equipment | |
CN115658934A (en) | Image-text cross-modal retrieval method based on multi-class attention mechanism | |
CN112182275A (en) | Trademark approximate retrieval system and method based on multi-dimensional feature fusion | |
CN112906517B (en) | Self-supervision power law distribution crowd counting method and device and electronic equipment | |
CN104346456B (en) | The digital picture multi-semantic meaning mask method measured based on spatial dependence | |
Wang et al. | Multimodal Poisson gamma belief network | |
CN111506832B (en) | Heterogeneous object completion method based on block matrix completion | |
WO2022162427A1 (en) | Annotation-efficient image anomaly detection | |
Chiang | Interactive tool for image annotation using a semi-supervised and hierarchical approach | |
CN115994239A (en) | Prototype comparison learning-based semi-supervised remote sensing image retrieval method and system | |
TW202004519A (en) | Method for automatically classifying images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |