CN101021849A - Transmedia searching method based on content correlation - Google Patents

Transmedia searching method based on content correlation Download PDF

Info

Publication number
CN101021849A
CN101021849A CN 200610053390 CN200610053390A CN101021849A CN 101021849 A CN101021849 A CN 101021849A CN 200610053390 CN200610053390 CN 200610053390 CN 200610053390 A CN200610053390 A CN 200610053390A CN 101021849 A CN101021849 A CN 101021849A
Authority
CN
China
Prior art keywords
subspace
image
vector
data
isomorphism
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 200610053390
Other languages
Chinese (zh)
Other versions
CN100422999C (en
Inventor
***
庄越挺
吴飞
张鸿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CNB2006100533904A priority Critical patent/CN100422999C/en
Publication of CN101021849A publication Critical patent/CN101021849A/en
Application granted granted Critical
Publication of CN100422999C publication Critical patent/CN100422999C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This invention discloses a method for media-crossing searches based on content relativity, which applies the typical relativity analysis to analyze the content characters of different mode media data, maps a visual sense character vector of image data and an auditory character vector of audio data in a low dimension isomorphic sub-space simultaneously by a sub-space mapping algorithm, measures the relativities among different mode data based on a general distance function and modifies the topological structure of a multi-mode data set in the sub-space to increase the cross media search efficiency effectively.

Description

Content-based correlativity stride the medium search method
Technical field
The present invention relates to multimedia retrieval, what relate in particular to a kind of content-based correlativity strides the medium search method.
Background technology
Content-based multimedia retrieval is the research focus of computer vision and information retrieval field, carries out the similarity coupling according to vision, the sense of hearing or several how low-level image feature and realizes retrieval.As far back as 1976, Mai Geke just disclosed human brain to external world the cognition of information need cross over and comprehensive different sensory information, with the understanding of formation globality.The research of cognitive neuropsychology aspect has in the recent period verified further that also the human brain cognitive process presents the characteristic of striding medium, produces cognitive result from the information stimulation mutually of different sense organs such as vision, the sense of hearing, acting in conjunction.Therefore, press at present research a kind of support different modalities stride the medium search method, break through the restriction that the content-based multimedia retrieval of tradition only acts on the single mode data.
The content-based medium retrieval technique of striding is meant by the low-level image feature to multimedia object and analyzes, be implemented in the leap from a kind of mode to another kind of mode in the retrieving, it is the inquiry example that the user submits a kind of mode to, system returns the multimedia object of other different modalities similarly, has broken through the restriction to single mode of the image retrieval, audio retrieval, three-dimensional picture retrieval etc. of single mode.Stride new research field in multimedia analysis that medium retrievals is based on content and the retrieval, also ripe without comparison in the world at present medium searching algorithm and the technology of striding.
The initial stage nineties, people proposed the CBIR technology, extracted the visual signature of bottom from image, such as the index of bottom visual signatures such as color, texture, shape as image.This technology also was applied to video frequency searching and audio retrieval afterwards, and wherein also different at the low-level image feature that different media content adopted, video frequency searching may be used the motion vector feature, and audio retrieval is used time domain, frequency domain, compression domain feature etc.It is the prototype system of representative that content-based multimedia retrieval method has with QBIC, VideoQ etc. in early days, but owing to lack the support of high-level semantic, can not satisfy user's requirement on accuracy rate and efficient; Methods such as example study afterwards, convergence analysis and manifold learning are used to realize semantic information of multimedia understanding, to fill up the wide gap between low-level image feature and the high-level semantic; Then in order to overcome the deficiency of training sample, relevant feedback mechanism often is used, with perception priori in conjunction with the user, for example: utilize feedback information to revise query vector and make its distribution center to the coordinate indexing object move, adjust the weights etc. of each component in the distance metric formula, some machine learning methods also combine with related feedback method recently.Semantic wide gap has been dwindled in the use of these methods to a certain extent, has improved the performance of single mode retrieval.
Yet, the multimedia database that comprises single mode all can only be retrieved by existing multimedia retrieval system, though or can handle multi-modal media data, do not support to stride the retrieval of medium, promptly retrieve the multimedia object of other mode according to a kind of multimedia object of mode.Because not only intrinsic dimensionality is different between the aural signature of visual feature of image and audio frequency, and expresses different attributes, can't directly measure similarity, this isomerism and noncomparabilities are present between the multi-medium data of other mode equally.Therefore, above-mentioned single mode search method all can not be directly used in strides medium retrievals, because different with the single mode retrieval, the research object of striding the medium retrieval is different modalities, the low-level image feature space of isomery each other.
Some researchers have successively proposed similar research of striding medium thought, for example carry out the index and the retrieval of video database by excavating multi-modal feature, the text that transcribed text that news-video comprised and internet page are comprised is analyzed, realized object video and the similar coupling of internet page on text feature.But these researchs are at low-level image features different in the modality-specific media object, for example: the transcribed text that comprises in the video clips, color, texture etc., can not realize the flexible leap between the different modalities media data.
Canonical correlation analysis (Canonical Correlation Analysis) is a kind of statistical analysis technique, is applied to the data analysis of aspects such as economy, medical science, meteorology the earliest.But aspect multi-medium data analysis and retrieval, canonical correlation analysis but seldom is used, because this statistical analysis technique is to analyze the correlation information that exists between two kinds of different variablees fields, and traditional single mode retrieval technique research is a kind of single feature space of mode.
Summary of the invention
The present invention overcomes above-mentioned existing method in the restriction of retrieval on the mode, and what a kind of content-based correlativity was provided strides the medium search method.
The medium search method of striding of content-based correlativity may further comprise the steps:
(1) gathers the object of different modalities from multimedia database: image and voice data;
(2) visual signature of extraction view data, and the aural signature of voice data, vision that the extraction of employing canonical correlation analysis obtains and the canonical correlation between the aural signature;
(3) adopt isomorphism subspace mapping algorithm, the visual feature vector of view data and the aural signature vector of voice data are mapped in the isomorphism subspace of a low-dimensional simultaneously, realize the unified expression of different modalities media data;
(4) adopt polar mode to define general distance function, the correlativity size between tolerance different modalities media data, and stride the medium retrieval on this basis;
(5), be used for extracting the priori of user interactions, to revise the topological structure of multi-medium data collection in the isomorphism subspace based on the relevant feedback mechanism of incremental learning;
(6),, other media object beyond the training set are accurately navigated in the isomorphism subspace perhaps by relevant feedback mechanism according to the base vector of asking in the mapping process of subspace.
The visual signature of described extraction view data, and the aural signature of voice data, vision that the extraction of employing canonical correlation analysis obtains and the canonical correlation between the aural signature: the level image visual signature constitutes the characteristics of image vector of p dimension, the bottom aural signature of audio frequency constitutes q dimension audio feature vector, adopts canonical correlation analysis to learn visual feature of image X simultaneously (n * p)Aural signature Y with audio frequency (n * q), the eigenmatrix X of isomery (n * p)And Y (n * q)Between related coefficient be calculated as follows:
ρ = r ( L , M ) = A T C xy B A T C xx A B T C yy B , ( C xy = C xx C xy C yx C yy = C) - - - 1
Figure A20061005339000063
Figure A20061005339000064
Wherein A and B are linear transformation, by formula 2 turn to relevant between less union variable L and M having relevant between the eigenmatrix X of more a plurality of variablees and the Y, the numeric distribution of A and B is determined the space correlation distribution form of X and Y, the numerical values recited of A and B determine to the significance level of dependent variable.
Adopt isomorphism subspace mapping algorithm, the visual feature vector of view data and the aural signature vector of voice data are mapped in the isomorphism subspace of a low-dimensional simultaneously, realize the unified expression of different modalities media data: isomorphism subspace mapping algorithm is on the basis of canonical correlation analysis, study obtains the low n-dimensional subspace n of an optimum, has farthest kept original feature vector X (n * p)And Y (n * q)Between correlativity, algorithm steps is as follows:
Input: image characteristic matrix X (n * p), audio frequency characteristics matrix Y (n * q)
Output: all images data and the voice data vector representation L in low n-dimensional subspace n (n * m)And M (n * m)
Step 1:, view data all in the database and voice data are divided into different semantic classess with the average cluster of K by the mode of semi-supervised learning;
Step 2: under the constraint of formula 3, make related coefficient ρ=r (L, M) optimization,
v(L)=L TL=A TX TXA=1;v(M)=M TM=B TY TYB=1 3
Adopt method of Lagrange multipliers to obtain the equation C that form is Ax=λ Bx XyC Yy -1C YxA=λ 2C XxA, the characteristic root of asking for this equation promptly obtains separating of matrix A and B;
Step 3: linear method structure isomorphism subspace promptly becomes m dimension coordinate L with B with characteristics of image vector sum audio frequency characteristics DUAL PROBLEMS OF VECTOR MAPPING with base vector A respectively (n * m)And M (n * m)
Adopt polar mode to define general distance function, the correlativity size between tolerance different modalities media data, and stride the medium retrieval on this basis: image and voice data in the m n-dimensional subspace n with polar formal definition proper vector x i'=(x I1' ..., x Ik' ..., x Im'), (x Ik'=a+bi, (a, b ∈ R)), between image and the image, between audio frequency and the audio frequency and the similarity between image and the voice data be calculated as follows with general distance function:
CCAdis ( x i ′ , x j ′ ) = sqrt Σ k = 1 m ( | x ik ′ | 2 + | x jk ′ | 2 - 2 × | x ik ′ | × | x jk ′ | × Cosθ k ) ; - - - 4
β ik = arctg ( b / a ) , θ k = | β ik - β jk | , | x ik ′ | = a 2 + b 2 , k ∈ [ 1 , m ]
The user provides inquiry example image by man-machine interface in the retrieving, if this example is in tranining database, then find the m dimension coordinate of inquiry example in the subspace according to the subspace mapping result, with the distance between general distance function calculating and other audio frequency and view data, k image and k the audio frequency nearest with the query image example return to the user as Query Result; Equally, if the inquiry example is a section audio, then retrieve similar audio frequency and image object according to above-mentioned steps.
Relevant feedback mechanism based on incremental learning, be used for extracting the priori of user interactions, to revise the topological structure of multi-medium data collection in the isomorphism subspace: system can commonly use the perception priori that the family provides in relevant feedback process middle school, if Ω presentation video training set, A represents the audio frequency training set, definition " modifying factor " γ (i, j)=Pos (a i, b j) (a i∈ Ω, b j∈ A), be used to revise similarity between the different modalities media object: Crodis (i, j)=CCAdis (i, j)+ γ (i, j), modifying factor is initialized as zero;
When the user submits image querying example R to, use CCAdis (i, j) the k neighbour image collection C of calculating R in the subspace 1, (i j) calculates the k neighbour audio set C of R in the subspace to use Crodis 2, the return results of striding the medium retrieval is C 1And C 2
In user interaction process, the user marks positive example P and negative routine N,  by relevant feedback in Query Result Pi∈ P, order γ ( R , p i ) = - τ , ( τ > 0 ) , And find p according to CCAdis iIn audio database A-neighbour T={t 1..., t j..., t k), arrange by the ascending order of distance, then in the mode of equal difference, revise the γ value of each element among the set T successively: γ ( R , t j ) = - τ + j × d 1 , ( d 1 = τ / k ) ; Ni∈ N, order γ ( R , n i ) = τ , ( τ > 0 ) , And find n according to CCAdis iK-neighbour H={H in audio database A 1..., h j..., h k, arrange by the ascending order of distance, then in the mode of equal difference, revise the γ value of each element among the set H successively: γ ( R , h j ) = τ - j × d 2 , ( d 2 = τ / k ) ;
Equally, when the user submit to be audio object the time, making uses the same method upgrades modifying factor γ (i, j), the retrieving of next round is arranged the result who returns according to new similarity.
According to the base vector of asking in the mapping process of subspace, perhaps by relevant feedback mechanism, other media object beyond the training set are accurately navigated in the isomorphism subspace: when the inquiry example of user's submission does not belong to training dataset, the use characteristic extraction procedure extracts example visual feature of image vector V, divides following two kinds of situations to carry out the mapping of new media object to the isomorphism subspace:
(1) if the semantic information of known new media object representation, the subspace base vector that described training obtains according to claim 3 then, method with linear transformation is mapped to the isomorphism subspace that m ties up with vectorial V, with other multimedia object computer general distances in the training set;
(2) if content-based single mode retrieval is adopted in semantic the unknown of new media object representation, return the image similar, user's mark feedback positive example Z={z to inquiring about example 1..., a j, stride the medium searching system and calculate coordinate Pos (V)=Pos (z of new media object in m dimension isomorphism subspace with weighted average method 1) β 1+ ...+Pos (z j) β j, (β 1+ ...+β j=1).
Beneficial effect of the present invention:
1) this method has broken through content-based multimedia retrieval only at the restriction of single mode, proposes a kind of completely newly stride the medium search method.This method is analyzed the content characteristic of two kinds of different modalities simultaneously, excavates the canonical correlation on statistical significance between the feature;
2) the subspace mapping method has not only solved the isomerism problem between different modalities, and farthest in the subspace, kept correlation information between the multi-modal feature, this correlation information is actually a kind of semantic association information, so this method has merged semanteme when realizing the feature dimensionality reduction;
3) media object of different modalities can be with the vector representation of isomorphism, and the similarity under polar coordinate system between the compute vector is between the promptly identical mode and the distance between the different modalities.
Description of drawings
Fig. 1 is based on the system framework figure that strides the medium search method of content relevance;
Fig. 2 (a) is the multi-medium data collection distribution schematic diagram before relevant feedback in the isomorphism of the present invention subspace;
Fig. 2 (b) is the multi-medium data collection distribution schematic diagram after relevant feedback in the isomorphism of the present invention subspace;
Fig. 3 (a) is that the present invention serves as that the retrieval example adopts the isomorphism subspace method to obtain result for retrieval with " automobile " image;
Fig. 3 (b) is that the present invention serves as the result for retrieval that the retrieval example directly adopts content characteristic to obtain with " automobile " image;
Fig. 4 (a) is that the present invention serves as the result for retrieval that the retrieval example adopts the isomorphism subspace method to obtain with " war " image;
Fig. 4 (b) is that the present invention serves as the result for retrieval that the retrieval example directly adopts content characteristic to obtain with " war " image.
Embodiment
The bottom content characteristic of different modalities media object, as the aural signature (temporal signatures, frequency domain character, time-frequency characteristics etc.) of visual feature of image (color, texture, shape etc.) with audio frequency, intrinsic dimensionality isomery not only, and express different attributes, can't directly measure similarity.The present invention can analyze the visual signature and the aural signature of isomery simultaneously, and be foundation with the canonical correlation between the feature, carry out the subspace mapping, solved the isomerism and the noncomparabilities problem of striding in the medium retrieval, and the subspace mapping process has farthest kept the correlation information between the initial characteristics.The technical scheme and the step of striding the concrete enforcement of medium search method of content-based correlativity of the present invention are as follows:
1. training data choosing and marking
Canonical correlation inquiry learning between visual signature and the aural signature is to be based upon on the basis of semantic relation, with the method for statistical study, excavates connecting each other on the semantic hierarchies from low-level image feature.Choosing of training data need have view data and voice data to express similar semanteme simultaneously.For example,, choose the picture of expression " dog " resemblance, and the audio-frequency fragments of expression " dog " cry is as training data for " dog " this semantic classes.
In known semantic classes number, under the semantic tagger condition of unknown of view data and voice data, adopt the study of semi-supervised formula, images all in the database and voice data are marked in conjunction with the method for the average cluster of K, and cluster is to different semantic classess, and concrete steps are as follows:
Input: not Biao Zhu image data set Ω and audio data set Γ, semantic classes number Z;
Output: the semantic classes numbering under each view data and each voice data;
Step 1: for semantic classes Z i, 5 image examples A of random labelling i, calculate A iCluster barycenter ICtr i
Step 2: with ICtr iBe the initial input of the average clustering algorithm of K, Ω carries out cluster to the whole image data collection, is endowed identical semantic classes numbering in the image examples of identical cluster areas;
Step 3: also adopt step 1 and step 2 to carry out the mark of training data to voice data Γ.
2. the extraction of vision and aural signature
For the view data in each semantic classes, extract the bottom visual signature, comprising: hsv color histogram, color convergence vector CCV and Tamura direction degree are the characteristics of image vector x of every width of cloth image configuration p dimension p, the image data set composing images eigenmatrix X in the whole semantic classes (n * p)For the voice data in each semantic classes, extract the bottom aural signature, comprise: barycenter (Centroid), decay are the audio feature vector y of each section audio example structure q dimension by these four Mpeg compression domain features of frequency (Rolloff), frequency spectrum flow (Spectral Flux) and root mean square (RMS) q, the audio data set in the whole semantic classes constitutes audio frequency characteristics matrix Y (n * q)If the duration difference of voice data, the dimension of the audio frequency characteristics vector of extraction are also different, the present invention uses fuzzy clustering method, extracts the cluster barycenter of similar number as audio index in the original audio feature.
3. hold the isomorphism subspace mapping of multi-semantic meaning different modalities media data
On the basis of canonical correlation analysis, study obtains the low n-dimensional subspace n of an optimum, has farthest kept original feature vector X (n * p)And Y (n * q)Between correlativity, algorithm steps is as follows:
Input: image characteristic matrix X (n * p), audio frequency characteristics matrix Y (n * q)
Output: all images data and the voice data vector representation L in low n-dimensional subspace n (n * m)And M (n * m)
Step 1:, view data all in the database and voice data are divided into different semantic classess with the average cluster of K-by the mode of semi-supervised learning;
Step 2: at v (L)=L TL=A TX TXA=1; V (M)=M TM=B TY TUnder the constraint of YB=1, (L, M) optimization adopt method of Lagrange multipliers to obtain the equation C that form is Ax=λ Bx to make related coefficient ρ=r XyC Yy -1C YxA=λ 2C XxA, the characteristic root of asking for this equation promptly obtains separating of matrix A and B;
Step 3: linear method structure isomorphism subspace promptly becomes m dimension coordinate L with B with characteristics of image vector sum audio frequency characteristics DUAL PROBLEMS OF VECTOR MAPPING with base vector A respectively (n * m)And M (n * m)
4. adopt general distance function to calculate similarity
After the proper vector of all images and voice data converts the m dimensional vector that hangs down in the n-dimensional subspace n to, a large amount of plural numbers appear, in order to calculate the similarity between various mode media datas, the proper vector behind the employing polar form expression dimensionality reduction: x i'=(x I1' ..., x Ik' ..., x Im'), (x Ik'=a+bi, (a, b ∈ R)).Therefore, between image and the image, between audio frequency and the audio frequency and the similarity between image and the voice data be calculated as follows with general distance function:
CCAdis ( xi ′ , xj ′ ) = sqrt Σ k = 1 m ( | x ik ′ | 2 + | x jk ′ | 2 - 2 × | x ik ′ | × | x jk ′ | × Cos θ k ) ;
β ik = arctg ( b / a ) , θ k = | β ik - β jk | , | x ik ' | = a 2 + b 2 , k ∈ [ 1 , m ]
The user provides inquiry example image by man-machine interface in the retrieving, if this example is in tranining database, then find the m dimension coordinate of inquiry example in the subspace according to the subspace mapping result, with the distance between general distance function calculating and other audio frequency and view data, k image and k the audio frequency nearest with the query image example return to the user as Query Result; Equally, if the inquiry example is a section audio, then retrieve similar audio frequency and image object according to above-mentioned steps.
The present invention supports the retrieval of single mode and strides the retrieval of medium, promptly the user submit a kind of mode to media object as inquiry, in result for retrieval, can comprise the media object of other mode, and can cause new inquiry based on another kind of mode object.
5. relevant feedback
By content-based method, the canonical correlation between study visual signature and the aural signature, thus at utmost keeping realizing the subspace mapping under the constant situation of correlativity, solve feature isomerism problem.But because the wide gap between bottom content and the high-level semantic makes learning outcome and true semanteme there are differences.By user's relevant feedback, mark positive example and negative example in returning Query Result mark middle school idiom justice information from the user, and revise the topological structure of multi-medium data collection in the subspace that study obtains.
If Ω presentation video training set, A represents the audio frequency training set, definition " modifying factor " γ (i, j)=Pos (a i, b j) (a i∈ Ω, b j∈ A), be used to revise similarity between the different modalities media object: Crodis (i, j)=CCAdis (i, j)+ γ (i, j), modifying factor is initialized as zero; When the user submits image querying example R to, use CCAdis (i, j) the k neighbour image collection C of calculating R in the subspace 1, use Crodis (i, j)) to calculate the k neighbour audio set C of R in the subspace 2, the return results of striding the medium retrieval is C 1And C 2In user interaction process, the user marks positive example P and negative routine N,  p by relevant feedback in Query Result i∈ P, order γ ( R , p i ) = - τ , ( τ > 0 ) , And find p according to CCAdis iK-neighbour T={t in audio database A 1..., t j..., t k, arrange by the ascending order of distance, then in the mode of equal difference, revise the γ value of each element among the set T successively: γ ( R , t j ) = - τ + j × d 1 , ( d 1 = τ / k ) ;  n i∈ N, order γ ( R , n i ) = τ , ( τ > 0 ) , And find n according to CCddis iK-neighbour H={h in audio database A 1..., h j..., h k, arrange by the ascending order of distance, then in the mode of equal difference, revise the γ value of each element among the set H successively: γ ( R , h j ) = τ - j × d 2 , ( d 2 = τ / k ) ; Equally, when the user submit to be audio object the time, making uses the same method upgrades modifying factor γ (i, j)The retrieving of next round is arranged the result who returns according to new similarity.
6. the location of new media object
The single multimedia object that the user submits to is defined as the new media object.If the new media object is not in tranining database, also can pass through the subspace base vector, directly navigate in the subspace that training obtains with the method for linearity, perhaps mutual by simple user, accurately navigate in the subspace, remain in the subspace similar semantically simultaneously to multimedia object on every side.At first the use characteristic extraction procedure extracts example visual feature of image vector V, divides following two kinds of situations to carry out the mapping of new media object to the isomorphism subspace:
On the one hand, if the semantic information of known new media object representation, the subspace base vector that obtains according to training then is mapped to the isomorphism subspace of m dimension with the method for linear transformation with vectorial V, with other multimedia object computer general distances in the training set.
On the other hand,, adopt content-based single mode retrieval, return the image similar, user's mark feedback positive example Z={Z to inquiring about example if the new media object representation is semantic unknown 1... z j, stride the medium searching system and calculate coordinate Pos (V)=Pos (z of new media object in m dimension isomorphism subspace with weighted average method 1) β 1+ ...+Pos (z j) β j, (β 1+ ...+β j=1).
Embodiment 1
As shown in Figure 2, provided the example of some training datasets topological structures in low-dimensional isomorphism subspace.Describe the concrete steps that this example is implemented in detail below in conjunction with method of the present invention, as follows:
(1) view data and the voice data of 7 semantemes of collection (birds, dog, automobile, war, tiger, squirrel, monkey) are as training dataset;
(2) adopt feature extraction program to extract hsv color histogram, color convergence vector CCV and the Tamura direction degree feature of image, be the visual signature vector of every width of cloth image configuration 500 dimensions, be respectively the visual signature matrix of 7 semantic classes structure 70 * 500 dimensions;
(3) adopt feature extraction program to extract the barycenter (Centroid) of audio frequency, decay by these four Mpeg compression domain features of frequency (Rolloff), frequency spectrum flow (Spectral Flux) and root mean square (RMS);
(4) the duration difference of audio example, the proper vector length that extracts is also different, adopt fuzzy clustering method, the audio frequency characteristics vector unified specification of different dimensions is changed into the vector of 40 dimensions, as the index of every section audio example, be respectively the aural signature matrix of 7 semantic classes structure 70 * 40 dimensions;
(5) under the Matlab7.0 environment, use the canonical correlation analysis function, learn the pairing vision of training data of 7 semantic classess and the correlativity between the aural signature matrix respectively.And carrying out subspace mapping with linear method, the eigenmatrix with 70 * 500 and 70 * 40 is transformed into 70 * 40 and 70 * 40 new feature matrix respectively;
(6) basis CCAdis ( x i ′ , x j ′ ) = sqrt Σ k = 1 m ( | x ik ′ | 2 + | x jk ′ | 2 - 2 × | x ik ′ | × | x jk ′ | × Cosθ k ) Calculate the distance between the 40 characteristics of image vector sum audio frequency characteristics vectors of tieing up in the subspace, return and inquire about example nearest 20 width of cloth images and 20 section audios:
(7) in striding the medium retrieving, the user can be undertaken alternately by man-machine interface, mark striding the medium result for retrieval, system learns feedback positive example and the negative example of feedback that the user submits to automatically, the semantic information of extracting is used for revising the topological structure of multi-medium data collection in the isomorphism subspace, promptly uses respectively γ ( R , t j ) = - τ + j × d 1 , ( d 1 = τ / k ) With γ ( R , h j ) = τ - j × d 2 , ( d 2 = τ / k ) Revise around the positive example and the topological structure of multimedia object around the negative example.
Fig. 2 is an example with squirrel, birds and automobile, has shown in the isomorphism subspace that dimensionality reduction mapping obtains, and uses the theoretical distribution of the data of media object collection that CCAdis measures out, and through relevant feedback repair apart from after, the corresponding distribution situation that adopts Crodis to measure out.In Fig. 2 (a), and the image data set of CCAdis minimum is the image of birds between the squirrel audio data set, through relevant feedback, Crodis distance between squirrel audio frequency and the squirrel image " has furthered ", " pushed away " the Crodis distance between squirrel audio frequency and the birds image far away, and the topological relation of the topological relation of squirrel image inside and squirrel audio frequency inside remains unchanged substantially, shown in Fig. 2 (b).
Can see,, can learn the correlativity between image and voice data preferably, solve the isomerism problem between the different modalities media data, effectively realize striding the distance metric of medium by method of the present invention; And by relevant feedback, learnt the semantic information in the user interaction process, the distribution of multimedia number pick collection in the subspace meets the relation between the high-level semantic more.
Embodiment 2
As shown in Figure 4, provided one " war " promptly semantic retrieval example.Describe the concrete steps that this example is implemented in detail below in conjunction with method of the present invention, as follows:
(1) input be the semantic colour picture of a width of cloth " war " as the inquiry example, system finds the vector representation in the isomorphism subspace of this width of cloth picture correspondence;
(2) the subspace vector that adopts existing conversion method of data format will inquire about the example correspondence shows with polar mode;
(3) calculate the distance between other images and audio frequency in this inquiry example and database with general distance function, return preceding 10 nearest images and preceding 10 nearest audio example;
(4) directly use the bottom content characteristic of inquiring about example in addition, do not shine upon and do not carry out the subspace, mate with the content characteristic of other images in the database, promptly use content-based single mode search method, return preceding 10 images the most similar, the result for retrieval that obtains with the method for describing among the present invention compares.
The operation result of this example shows in accompanying drawing 4, wherein inquiring about example is the semantic colour blast picture of a reflection " war ", method with the present invention's description, shown in figure (a), (b) in contrast directly uses the bottom visual signature to mate the similar image that returns in the result of mating in the isomorphism subspace and returning.Even use coloured image, also can in preceding 10 result for retrieval, return and retrieve example and express common semantic black and white picture as the retrieval example.
Can see that method of the present invention can be understood the common semanteme of coloured image and black white image well, realize the mutual retrieval of black white image and coloured image, efficiently solve the accurate tolerance of multi-medium data on similarity that differs greatly on the content characteristic; And adopt content-based single mode search method, can only return and inquire about example similar picture on visual signature.

Claims (6)

  1. A content-based correlativity stride the medium search method, it is characterized in that may further comprise the steps:
    (1) gathers the object of different modalities from multimedia database, i.e. image and voice data;
    (2) visual signature of extraction view data, and the aural signature of voice data, vision that the extraction of employing canonical correlation analysis obtains and the canonical correlation between the aural signature;
    (3) adopt isomorphism subspace mapping algorithm, the visual feature vector of view data and the aural signature vector of voice data are mapped in the isomorphism subspace of a low-dimensional simultaneously, realize the unified expression of different modalities media data;
    (4) adopt polar mode to define general distance function, the correlativity size between tolerance different modalities media data, and stride the medium retrieval on this basis;
    (5), be used for extracting the priori of user interactions, to revise the topological structure of multi-medium data collection in the isomorphism subspace based on the relevant feedback mechanism of incremental learning;
    (6),, other media object beyond the training set are accurately navigated in the isomorphism subspace perhaps by relevant feedback mechanism according to the base vector of asking in the mapping process of subspace.
  2. 2, content-based correlativity according to claim 1 strides the medium search method, it is characterized in that, the visual signature of described extraction view data, and the aural signature of voice data, vision that the extraction of employing canonical correlation analysis obtains and the canonical correlation between the aural signature: the level image visual signature constitutes the characteristics of image vector of p dimension, the bottom aural signature of audio frequency constitutes q dimension audio feature vector, adopts canonical correlation analysis to learn visual feature of image X simultaneously (n * p)Aural signature Y with audio frequency (n * q), the eigenmatrix X of isomery (n * p)And Y (n * q)Between related coefficient be calculated as follows:
    ρ = r ( L , M ) = A T C xy B A T C xx A B T C yy B , ( C xy = C xx C xy C yx C yy = C ) - - - 1
    Figure A2006100533900002C3
    Wherein A and B are linear transformation, by formula 2 turn to relevant between less union variable L and M having relevant between the eigenmatrix X of more a plurality of variablees and the Y, the numeric distribution of A and B is determined the space correlation distribution form of X and Y, the numerical values recited of A and B determine to the significance level of dependent variable.
  3. 3, content-based correlativity according to claim 1 strides the medium search method, it is characterized in that, described employing isomorphism subspace mapping algorithm, the visual feature vector of view data and the aural signature vector of voice data are mapped in the isomorphism subspace of a low-dimensional simultaneously, realize the unified expression of different modalities media data: isomorphism subspace mapping algorithm is on the basis of canonical correlation analysis, study obtains the low n-dimensional subspace n of an optimum, has farthest kept original feature vector X (n * p)And Y (n * q)Between correlativity, algorithm steps is as follows:
    Input: image characteristic matrix X (n * p), audio frequency characteristics matrix Y (n * q)
    Output: all images data and the voice data vector representation L in low n-dimensional subspace n (n * m)And M (n * m)
    Step 1:, view data all in the database and voice data are divided into different semantic classess with the average cluster of K by the mode of semi-supervised learning;
    Step 2: under the constraint of formula 3, make related coefficient ρ=r (L, M) optimization,
    v(L)=L TL=A TX TXA=1;v(M)=M TM=B TY TYB=1 3
    Adopt method of Lagrange multipliers to obtain the equation C that form is Ax=λ Bx XyC Yy -1C YxA=λ 2C XxA, the characteristic root of asking for this equation promptly obtains separating of matrix A and B;
    Step 3: linear method structure isomorphism subspace promptly becomes m dimension coordinate L with B with characteristics of image vector sum audio frequency characteristics DUAL PROBLEMS OF VECTOR MAPPING with base vector A respectively (n * m)And M (n * m)
  4. 4, this content-based correlativity according to claim 1 strides the medium search method, it is characterized in that, the polar mode of described employing defines general distance function, correlativity size between tolerance different modalities media data, and stride the medium retrieval on this basis: image and voice data in the m n-dimensional subspace n with polar formal definition proper vector x i'=(x Il' ..., x Ik' ..., x Im'), (x Ik'=a+bi, (a, b ∈ R)), between image and the image, between audio frequency and the audio frequency and the similarity between image and the voice data be calculated as follows with general distance function:
    CCAdi s ( x i ′ , x j ′ ) = sqrt Σ k = 1 m ( | x ik ′ | 2 + | x jk ′ | 2 - 2 × | x ik ′ | × | x jk ′ | × Cos θ k ) ; - - - 4 4
    β ik = arctg ( b / a ) , θ k = | β ik - β jk | , = | x ik ′ | = a 2 + b 2 , k ∈ [ 1 , m ]
    The user provides inquiry example image by man-machine interface in the retrieving, if this example is in tranining database, then find the m dimension coordinate of inquiry example in the subspace according to the subspace mapping result, with the distance between general distance function calculating and other audio frequency and view data, k image and k the audio frequency nearest with the query image example return to the user as Query Result; Equally, if the inquiry example is a section audio, then retrieve similar audio frequency and image object according to above-mentioned steps.
  5. 5, this content-based correlativity according to claim 1 strides the medium search method, it is characterized in that, described relevant feedback mechanism based on incremental learning, be used for extracting the priori of user interactions, to revise the topological structure of multi-medium data collection in the isomorphism subspace: system can commonly use the perception priori that the family provides in relevant feedback process middle school, if Ω presentation video training set, A represents the audio frequency training set, definition " modifying factor " γ (i, j)=Pos (a i, b j) (a i∈ Ω, b j∈ A), be used to revise similarity between the different modalities media object: Crodis (i, j)=CCAdis (i, j)+ γ (i, j), modifying factor is initialized as zero;
    When the user submits image querying example R to, use CCAdis (i, j)Calculate the k neighbour image collection C of R in the subspace 1, use Crodis (i, j)Calculate the k neighbour audio set C of R in the subspace 2, the return results of striding the medium retrieval is C 1And C 2
    In user interaction process, the user marks positive example P and negative routine N,  by relevant feedback in Query Result Pi∈ P makes γ (R, p iThe τ of)=-, (τ>0), and find p according to CCAdis iK-neighbour T={t in audio database A 1..., t j..., t k, arrange by the ascending order of distance, then in the mode of equal difference, revise the γ value of each element among the set T successively: γ (R, t jτ+the j of)=-* d 1, (d 1=τ/k);  Ni∈ N, make γ (R, ni)=τ, (τ>0), and find the k-neighbour H={h of ni in audio database A according to CCAdis 1... h j... h k, arrange by the ascending order of distance, then in the mode of equal difference, revise the γ value of each element among the set H successively:
    γ(R,h j)=τ-j×d 2,(d 2=τ/k);
    Equally, when the user submit to be audio object the time, making uses the same method upgrades modifying factor γ (i, j), the retrieving of next round is arranged the result who returns according to new similarity.
  6. 6. this content-based correlativity according to claim 1 strides the medium search method, it is characterized in that, described according to the base vector of asking in the mapping process of subspace, perhaps by relevant feedback mechanism, other media object beyond the training set are accurately navigated in the isomorphism subspace: when the inquiry example of user's submission does not belong to training dataset, the use characteristic extraction procedure extracts example visual feature of image vector V, divides following two kinds of situations to carry out the mapping of new media object to the isomorphism subspace:
    (1) if the semantic information of known new media object representation, the subspace base vector that described training obtains according to claim 3 then, method with linear transformation is mapped to the isomorphism subspace that m ties up with vectorial V, with other multimedia object computer general distances in the training set;
    (2) if content-based single mode retrieval is adopted in semantic the unknown of new media object representation, return the image similar, user's mark feedback positive example Z={z to inquiring about example 1..., z j, stride the medium searching system and calculate coordinate Pos (V)=Pos (z of new media object in m dimension isomorphism subspace with weighted average method 1) β 1+ ...+Pos (z j) β j, (β 1+ ...+β j=1).
CNB2006100533904A 2006-09-14 2006-09-14 Transmedia searching method based on content correlation Expired - Fee Related CN100422999C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2006100533904A CN100422999C (en) 2006-09-14 2006-09-14 Transmedia searching method based on content correlation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2006100533904A CN100422999C (en) 2006-09-14 2006-09-14 Transmedia searching method based on content correlation

Publications (2)

Publication Number Publication Date
CN101021849A true CN101021849A (en) 2007-08-22
CN100422999C CN100422999C (en) 2008-10-01

Family

ID=38709618

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2006100533904A Expired - Fee Related CN100422999C (en) 2006-09-14 2006-09-14 Transmedia searching method based on content correlation

Country Status (1)

Country Link
CN (1) CN100422999C (en)

Cited By (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101833565A (en) * 2010-03-31 2010-09-15 南京大学 Method for actively selecting related feedbacks of representative image
CN101984424A (en) * 2010-10-26 2011-03-09 浙江工商大学 Mass inter-media index method
CN101546556B (en) * 2008-03-28 2011-03-23 展讯通信(上海)有限公司 Classification system for identifying audio content
CN102262670A (en) * 2011-07-29 2011-11-30 中山大学 Cross-media information retrieval system and method based on mobile visual equipment
CN102521368A (en) * 2011-12-16 2012-06-27 武汉科技大学 Similarity matrix iteration based cross-media semantic digesting and optimizing method
CN102640153A (en) * 2009-12-04 2012-08-15 诺基亚公司 Method and apparatus for providing media content searching capabilities
CN102663447A (en) * 2012-04-28 2012-09-12 中国科学院自动化研究所 Cross-media searching method based on discrimination correlation analysis
CN102693321A (en) * 2012-06-04 2012-09-26 常州南京大学高新技术研究院 Cross-media information analysis and retrieval method
CN102693316A (en) * 2012-05-29 2012-09-26 中国科学院自动化研究所 Linear generalization regression model based cross-media retrieval method
CN102713900A (en) * 2009-11-03 2012-10-03 高通股份有限公司 Data searching using spatial auditory cues
CN102932321A (en) * 2011-08-08 2013-02-13 索尼公司 Information processing apparatus, information processing method, program, and information processing system
CN103049526A (en) * 2012-12-20 2013-04-17 中国科学院自动化研究所 Cross-media retrieval method based on double space learning
CN103279579A (en) * 2013-06-24 2013-09-04 魏骁勇 Video retrieval method based on visual space
WO2013159356A1 (en) * 2012-04-28 2013-10-31 中国科学院自动化研究所 Cross-media searching method based on discrimination correlation analysis
WO2013177751A1 (en) * 2012-05-29 2013-12-05 中国科学院自动化研究所 Cross-media retrieval method based on generalized linear regression model
CN103793447A (en) * 2012-10-26 2014-05-14 汤晓鸥 Method and system for estimating semantic similarity among music and images
CN103995903A (en) * 2014-06-12 2014-08-20 武汉科技大学 Cross-media search method based on isomorphic subspace mapping and optimization
CN103995804A (en) * 2013-05-20 2014-08-20 中国科学院计算技术研究所 Cross-media topic detection method and device based on multimodal information fusion and graph clustering
CN104077408A (en) * 2014-07-11 2014-10-01 浙江大学 Distributed semi-supervised content identification and classification method and device for large-scale cross-media data
CN104166982A (en) * 2014-06-30 2014-11-26 复旦大学 Image optimization clustering method based on typical correlation analysis
CN104679902A (en) * 2015-03-20 2015-06-03 湘潭大学 Information abstract extraction method in conjunction with cross-media fuse
CN105574133A (en) * 2015-12-15 2016-05-11 苏州贝多环保技术有限公司 Multi-mode intelligent question answering system and method
CN105930873A (en) * 2016-04-27 2016-09-07 天津中科智能识别产业技术研究院有限公司 Self-paced cross-modal matching method based on subspace
CN105938561A (en) * 2016-04-13 2016-09-14 南京大学 Canonical-correlation-analysis-based computer data attribute reduction method
CN106095893A (en) * 2016-06-06 2016-11-09 北京大学深圳研究生院 A kind of cross-media retrieval method
CN106127305A (en) * 2016-06-17 2016-11-16 中国科学院信息工程研究所 A kind of for method for measuring similarity between the allos of multi-source heterogeneous data
CN106663429A (en) * 2014-03-10 2017-05-10 韦利通公司 Engine, system and method of providing audio transcriptions for use in content resources
CN107209760A (en) * 2014-12-10 2017-09-26 凯恩迪股份有限公司 The sub-symbol data coding of weighting
CN107480158A (en) * 2016-06-07 2017-12-15 百度(美国)有限责任公司 The method and system of the matching of content item and image is assessed based on similarity score
CN107766571A (en) * 2017-11-08 2018-03-06 北京大学 The search method and device of a kind of multimedia resource
CN108228757A (en) * 2017-12-21 2018-06-29 北京市商汤科技开发有限公司 Image search method and device, electronic equipment, storage medium, program
CN108885639A (en) * 2016-03-29 2018-11-23 斯纳普公司 Properties collection navigation and automatic forwarding
CN109074363A (en) * 2016-05-09 2018-12-21 华为技术有限公司 Data query method, data query system determine method and apparatus
CN109408648A (en) * 2018-10-26 2019-03-01 京东方科技集团股份有限公司 It is associated with and determines method, works recommended method
US10275685B2 (en) 2014-12-22 2019-04-30 Dolby Laboratories Licensing Corporation Projection-based audio object extraction from audio content
CN109784405A (en) * 2019-01-16 2019-05-21 山东建筑大学 Cross-module state search method and system based on pseudo label study and semantic consistency
CN109784287A (en) * 2019-01-22 2019-05-21 中国科学院自动化研究所 Information processing method, system, device based on scene class signal forehead leaf network
CN109992676A (en) * 2019-04-01 2019-07-09 中国传媒大学 Across the media resource search method of one kind and searching system
CN110019898A (en) * 2017-08-08 2019-07-16 航天信息股份有限公司 A kind of animation image processing system
CN110879863A (en) * 2018-08-31 2020-03-13 阿里巴巴集团控股有限公司 Cross-domain search method and cross-domain search device
CN111046166A (en) * 2019-12-10 2020-04-21 中山大学 Semi-implicit multi-modal recommendation method based on similarity correction
CN111291204A (en) * 2019-12-10 2020-06-16 河北金融学院 Multimedia data fusion method and device
CN111931866A (en) * 2020-09-21 2020-11-13 平安科技(深圳)有限公司 Medical data processing method, device, equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7185049B1 (en) * 1999-02-01 2007-02-27 At&T Corp. Multimedia integration description scheme, method and system for MPEG-7
JP2001282813A (en) * 2000-03-29 2001-10-12 Toshiba Corp Multimedia data retrieval method, index information providing method, multimedia data retrieval device, index server and multimedia data retrieval server
CN1267838C (en) * 2002-12-31 2006-08-02 程松林 Sound searching method and video and audio information searching system using said method
CN100336061C (en) * 2003-08-08 2007-09-05 富士通株式会社 Multimedia object searching device and methoed
CN1529264A (en) * 2003-10-06 2004-09-15 李少峰 Method for searching associated multimedia content through text block position coding

Cited By (67)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101546556B (en) * 2008-03-28 2011-03-23 展讯通信(上海)有限公司 Classification system for identifying audio content
CN102713900A (en) * 2009-11-03 2012-10-03 高通股份有限公司 Data searching using spatial auditory cues
CN102713900B (en) * 2009-11-03 2014-12-10 高通股份有限公司 Data searching using spatial auditory cues
CN102640153A (en) * 2009-12-04 2012-08-15 诺基亚公司 Method and apparatus for providing media content searching capabilities
CN101833565B (en) * 2010-03-31 2011-10-19 南京大学 Method for actively selecting related feedbacks of representative image
CN101833565A (en) * 2010-03-31 2010-09-15 南京大学 Method for actively selecting related feedbacks of representative image
CN101984424A (en) * 2010-10-26 2011-03-09 浙江工商大学 Mass inter-media index method
CN102262670A (en) * 2011-07-29 2011-11-30 中山大学 Cross-media information retrieval system and method based on mobile visual equipment
CN102932321A (en) * 2011-08-08 2013-02-13 索尼公司 Information processing apparatus, information processing method, program, and information processing system
CN102521368A (en) * 2011-12-16 2012-06-27 武汉科技大学 Similarity matrix iteration based cross-media semantic digesting and optimizing method
CN102521368B (en) * 2011-12-16 2013-08-21 武汉科技大学 Similarity matrix iteration based cross-media semantic digesting and optimizing method
CN102663447A (en) * 2012-04-28 2012-09-12 中国科学院自动化研究所 Cross-media searching method based on discrimination correlation analysis
WO2013159356A1 (en) * 2012-04-28 2013-10-31 中国科学院自动化研究所 Cross-media searching method based on discrimination correlation analysis
CN102693316A (en) * 2012-05-29 2012-09-26 中国科学院自动化研究所 Linear generalization regression model based cross-media retrieval method
CN102693316B (en) * 2012-05-29 2014-03-26 中国科学院自动化研究所 Linear generalization regression model based cross-media retrieval method
WO2013177751A1 (en) * 2012-05-29 2013-12-05 中国科学院自动化研究所 Cross-media retrieval method based on generalized linear regression model
CN102693321A (en) * 2012-06-04 2012-09-26 常州南京大学高新技术研究院 Cross-media information analysis and retrieval method
CN103793447A (en) * 2012-10-26 2014-05-14 汤晓鸥 Method and system for estimating semantic similarity among music and images
CN103793447B (en) * 2012-10-26 2019-05-14 汤晓鸥 The estimation method and estimating system of semantic similarity between music and image
CN103049526A (en) * 2012-12-20 2013-04-17 中国科学院自动化研究所 Cross-media retrieval method based on double space learning
CN103049526B (en) * 2012-12-20 2015-08-05 中国科学院自动化研究所 Based on the cross-media retrieval method of double space study
CN103995804A (en) * 2013-05-20 2014-08-20 中国科学院计算技术研究所 Cross-media topic detection method and device based on multimodal information fusion and graph clustering
CN103995804B (en) * 2013-05-20 2017-02-01 中国科学院计算技术研究所 Cross-media topic detection method and device based on multimodal information fusion and graph clustering
CN103279579B (en) * 2013-06-24 2016-07-06 魏骁勇 The video retrieval method in view-based access control model space
CN103279579A (en) * 2013-06-24 2013-09-04 魏骁勇 Video retrieval method based on visual space
CN106663429A (en) * 2014-03-10 2017-05-10 韦利通公司 Engine, system and method of providing audio transcriptions for use in content resources
CN103995903B (en) * 2014-06-12 2017-04-12 武汉科技大学 Cross-media search method based on isomorphic subspace mapping and optimization
CN103995903A (en) * 2014-06-12 2014-08-20 武汉科技大学 Cross-media search method based on isomorphic subspace mapping and optimization
CN104166982A (en) * 2014-06-30 2014-11-26 复旦大学 Image optimization clustering method based on typical correlation analysis
CN104077408A (en) * 2014-07-11 2014-10-01 浙江大学 Distributed semi-supervised content identification and classification method and device for large-scale cross-media data
CN104077408B (en) * 2014-07-11 2017-09-29 浙江大学 Extensive across media data distributed semi content of supervision method for identifying and classifying and device
US11061952B2 (en) 2014-12-10 2021-07-13 Kyndi, Inc. Weighted subsymbolic data encoding
CN107209760A (en) * 2014-12-10 2017-09-26 凯恩迪股份有限公司 The sub-symbol data coding of weighting
US10275685B2 (en) 2014-12-22 2019-04-30 Dolby Laboratories Licensing Corporation Projection-based audio object extraction from audio content
CN104679902B (en) * 2015-03-20 2017-11-28 湘潭大学 A kind of informative abstract extracting method of combination across Media Convergence
CN104679902A (en) * 2015-03-20 2015-06-03 湘潭大学 Information abstract extraction method in conjunction with cross-media fuse
CN105574133A (en) * 2015-12-15 2016-05-11 苏州贝多环保技术有限公司 Multi-mode intelligent question answering system and method
US11729252B2 (en) 2016-03-29 2023-08-15 Snap Inc. Content collection navigation and autoforwarding
CN108885639A (en) * 2016-03-29 2018-11-23 斯纳普公司 Properties collection navigation and automatic forwarding
CN105938561A (en) * 2016-04-13 2016-09-14 南京大学 Canonical-correlation-analysis-based computer data attribute reduction method
CN105930873A (en) * 2016-04-27 2016-09-07 天津中科智能识别产业技术研究院有限公司 Self-paced cross-modal matching method based on subspace
CN105930873B (en) * 2016-04-27 2019-02-12 天津中科智能识别产业技术研究院有限公司 A kind of walking across mode matching method certainly based on subspace
CN109074363A (en) * 2016-05-09 2018-12-21 华为技术有限公司 Data query method, data query system determine method and apparatus
CN106095893A (en) * 2016-06-06 2016-11-09 北京大学深圳研究生院 A kind of cross-media retrieval method
CN106095893B (en) * 2016-06-06 2018-11-20 北京大学深圳研究生院 A kind of cross-media retrieval method
CN107480158B (en) * 2016-06-07 2021-01-12 百度(美国)有限责任公司 Method and system for evaluating matching of content item and image based on similarity score
CN107480158A (en) * 2016-06-07 2017-12-15 百度(美国)有限责任公司 The method and system of the matching of content item and image is assessed based on similarity score
CN106127305A (en) * 2016-06-17 2016-11-16 中国科学院信息工程研究所 A kind of for method for measuring similarity between the allos of multi-source heterogeneous data
CN106127305B (en) * 2016-06-17 2019-07-16 中国科学院信息工程研究所 A kind of heterologous method for measuring similarity for multi-source heterogeneous data
CN110019898A (en) * 2017-08-08 2019-07-16 航天信息股份有限公司 A kind of animation image processing system
CN107766571A (en) * 2017-11-08 2018-03-06 北京大学 The search method and device of a kind of multimedia resource
CN107766571B (en) * 2017-11-08 2021-02-09 北京大学 Multimedia resource retrieval method and device
CN108228757A (en) * 2017-12-21 2018-06-29 北京市商汤科技开发有限公司 Image search method and device, electronic equipment, storage medium, program
CN110879863B (en) * 2018-08-31 2023-04-18 阿里巴巴集团控股有限公司 Cross-domain search method and cross-domain search device
CN110879863A (en) * 2018-08-31 2020-03-13 阿里巴巴集团控股有限公司 Cross-domain search method and cross-domain search device
CN109408648B (en) * 2018-10-26 2021-01-22 京东方科技集团股份有限公司 Association determination method and work recommendation method
CN109408648A (en) * 2018-10-26 2019-03-01 京东方科技集团股份有限公司 It is associated with and determines method, works recommended method
CN109784405B (en) * 2019-01-16 2020-09-08 山东建筑大学 Cross-modal retrieval method and system based on pseudo-tag learning and semantic consistency
CN109784405A (en) * 2019-01-16 2019-05-21 山东建筑大学 Cross-module state search method and system based on pseudo label study and semantic consistency
US10915815B1 (en) 2019-01-22 2021-02-09 Institute Of Automation, Chinese Academy Of Sciences Information processing method, system and device based on contextual signals and prefrontal cortex-like network
CN109784287A (en) * 2019-01-22 2019-05-21 中国科学院自动化研究所 Information processing method, system, device based on scene class signal forehead leaf network
CN109992676B (en) * 2019-04-01 2020-12-25 中国传媒大学 Cross-media resource retrieval method and retrieval system
CN109992676A (en) * 2019-04-01 2019-07-09 中国传媒大学 Across the media resource search method of one kind and searching system
CN111291204A (en) * 2019-12-10 2020-06-16 河北金融学院 Multimedia data fusion method and device
CN111046166A (en) * 2019-12-10 2020-04-21 中山大学 Semi-implicit multi-modal recommendation method based on similarity correction
CN111291204B (en) * 2019-12-10 2023-08-29 河北金融学院 Multimedia data fusion method and device
CN111931866A (en) * 2020-09-21 2020-11-13 平安科技(深圳)有限公司 Medical data processing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN100422999C (en) 2008-10-01

Similar Documents

Publication Publication Date Title
CN100422999C (en) Transmedia searching method based on content correlation
Torralba et al. 80 million tiny images: A large data set for nonparametric object and scene recognition
CN102521368B (en) Similarity matrix iteration based cross-media semantic digesting and optimizing method
Chen et al. CLUE: cluster-based retrieval of images by unsupervised learning
Krishnapuram et al. Content-based image retrieval based on a fuzzy approach
Chadha et al. Comparative study and optimization of feature-extraction techniques for content based image retrieval
CN102902826B (en) A kind of image method for quickly retrieving based on reference picture index
JP2006510114A (en) Representation of content in conceptual model space and method and apparatus for retrieving it
CN102663447B (en) Cross-media searching method based on discrimination correlation analysis
CN104156433B (en) Image retrieval method based on semantic mapping space construction
CN111324765A (en) Fine-grained sketch image retrieval method based on depth cascade cross-modal correlation
CN112905822A (en) Deep supervision cross-modal counterwork learning method based on attention mechanism
CN103995903B (en) Cross-media search method based on isomorphic subspace mapping and optimization
CN105849720A (en) Visual semantic complex network and method for forming network
CN103336835B (en) Image retrieval method based on weight color-sift characteristic dictionary
Qian et al. HWVP: hierarchical wavelet packet descriptors and their applications in scene categorization and semantic concept retrieval
CN105389326A (en) Image annotation method based on weak matching probability canonical correlation model
CN106250925B (en) A kind of zero Sample video classification method based on improved canonical correlation analysis
Barz et al. Content-based image retrieval and the semantic gap in the deep learning era
CN105701225A (en) Cross-media search method based on unification association supergraph protocol
JP2007080061A5 (en)
Sasikala et al. Efficient content based image retrieval system with metadata processing
Yen et al. Ranked centroid projection: A data visualization approach with self-organizing maps
CN105069136A (en) Image recognition method in big data environment
Belattar et al. CBIR using relevance feedback: comparative analysis and major challenges

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20081001

Termination date: 20120914