CN108647350A - Image-text associated retrieval method based on two-channel network - Google Patents
Image-text associated retrieval method based on two-channel network Download PDFInfo
- Publication number
- CN108647350A CN108647350A CN201810465884.6A CN201810465884A CN108647350A CN 108647350 A CN108647350 A CN 108647350A CN 201810465884 A CN201810465884 A CN 201810465884A CN 108647350 A CN108647350 A CN 108647350A
- Authority
- CN
- China
- Prior art keywords
- text
- image
- data
- network model
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 25
- 238000012549 training Methods 0.000 claims abstract description 33
- 238000000605 extraction Methods 0.000 claims description 34
- 239000013598 vector Substances 0.000 claims description 34
- 238000003062 neural network model Methods 0.000 claims description 15
- 238000010276 construction Methods 0.000 claims description 12
- 239000000284 extract Substances 0.000 claims description 10
- 230000000306 recurrent effect Effects 0.000 claims description 9
- 241001269238 Data Species 0.000 claims description 5
- 238000013075 data extraction Methods 0.000 claims description 5
- 230000000644 propagated effect Effects 0.000 claims description 4
- 238000013139 quantization Methods 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 239000012141 concentrate Substances 0.000 claims description 3
- 230000010365 information processing Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 230000000875 corresponding effect Effects 0.000 abstract 2
- 230000002596 correlated effect Effects 0.000 abstract 1
- 230000006870 function Effects 0.000 description 15
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 239000000203 mixture Substances 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000007787 long-term memory Effects 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005549 size reduction Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a picture and text associated retrieval method based on a two-channel network. The image-text associated retrieval method based on the two-channel network comprises the following steps: constructing a training data set; constructing an image depth network model, constructing a text depth network model, constructing a related target loss function of the image characteristics and the text characteristics, and training the image depth network model and the text depth network model; respectively extracting the features of the image data and the text data in the data of the search library through the image depth network model and the text depth network model, extracting the image features and the text features of corresponding depths, and storing the image features and the text features in a correlated manner to form an index database; extracting the characteristics of the query data, matching the extracted characteristics of the query data with the corresponding text characteristics or image characteristics in the index database, and sorting and returning the query results according to the matching results.
Description
Technical field
The invention belongs to technical field of information retrieval, more particularly to a kind of picture and text associative search based on binary channels network
Method.
Background technology
Fast-developing with technologies such as " internet+", " big datas ", the mankind obtain image and the data volume of text shows
How surprising growth rate finds the important research content that content needed for user is information retrieval in large-scale data.
Traditional information retrieval can no longer meet current reality by text query text or by the technology of image querying image
Border demand.People be more desirable to by a retrieval text obtain with the relevant image of text or text, or pass through shooting/submission
One photo is retrieved and the relevant image of photo content or text.Above by text query text or image query image
Method belongs to the inquiry of single mode data, and how to carry out information search in multiple modalities or carry out cross-module state information search getting over
More to obtain the attention of people.It needs to solve different modalities with image querying text or with the cross-module state retrieval of text query image
The feature unified representation problem of data, since there are greatest differences in data format for text data and image data, such as
What unified representation different modalities data becomes multi-modal or the retrieval of cross-module state key problem.
In order to solve this key problem, method that conventional multi-mode state or cross-module state data indicate is typically by by image
Data obtain semantic text through semantic understanding, then are inquired by semantic text, but such method is by the essence of semantic understanding
Degree limitation without being developed very well.In recent years, with the development of the development of deep learning, especially convolutional neural networks,
Significantly more efficient character representation can be extracted by operations such as convolution, ponds, the semantic understanding and character representation of image are showed
Go out extraordinary characteristic.Meanwhile the neural network language model of text data has also obtained fabulous development, wherein cycle nerve
Network shows powerful long-term memory ability to the modeling of sequence data because of door control units such as shot and long term memories, can be used for
Text data is modeled.Character representation and Recognition with Recurrent Neural Network clock synchronization ordinal number of the above-mentioned convolutional neural networks to image
According to character representation, same characterization ability can be reached, but how to combine convolutional network and Recognition with Recurrent Neural Network,
The consistent expression of common study different modalities data is still the Important Problems of limit multimode state and cross-module state information retrieval.
Invention content
It is an object of the invention in view of the drawbacks of the prior art or problem, provide a kind of picture and text based on binary channels network
Associative search method.
Technical scheme is as follows:A kind of picture and text associative search method based on binary channels network includes following step
Suddenly:Training dataset is constructed, the training data concentration includes multiple pairs of image datas and text data;Construction is to described
Image data carries out the picture depth network model of image characteristics extraction, and construction carries out Text character extraction to the text data
Text depth network model, and the associated objects loss function of described image feature and the text feature is constructed, according to institute
State associated objects loss function, training image depth network model and text depth network model;Pass through described image depth net
Network model and text depth network model respectively in search library data image data and text data carry out feature extraction, carry
The characteristics of image and text feature of corresponding depth are taken, and the two is preserved in association and forms index data base;Extraction inquiry number
According to feature, by it is described inquiry data extraction feature and corresponding text feature or characteristics of image in the index data Kuku
It is matched, and is sorted according to matching result and return to query result.
Preferably, training dataset is constructed, the training data concentration includes multiple pairs of image data and textual data
According to specifically comprising the following steps:
The image data of preset quantity is obtained, and dimension normalization is to 224 × 224 pixel sizes;
Artificial text description, the language that usual description content is one section tens or a word up to a hundred is constituted are carried out to image data
Sentence;
Text description is carried out the pretreatment such as segmenting, obtains text word sequence;
Vector quantization expression is carried out to each word after participle, one section of text representation is a sequence vector for including N number of word,
N is positive integer.
Preferably, construction specifically includes the picture depth network model of described image data progress image characteristics extraction:
A neural network model is constructed, and the neural network model includes several convolution units and pond layer, each
Convolution unit includes that a batch normalizes layer, a convolutional layer and a nonlinear activation layer, and the neural network model
Feature is finally exported by a global pool layer.
Preferably, construction specifically includes the text depth network model of text data progress Text character extraction:
Recognition with Recurrent Neural Network model is constructed, the Recognition with Recurrent Neural Network model includes a door control unit, and the gate
Unit cycle receives current input vector and previous moment output quantity, and after the door control unit information processing, exports one
Vector is used as text feature.
Preferably, it constructs characteristics of image and the associated objects loss function of text feature specifically includes:
Setting training data concentrates feature vector of each data sample after network exports as f, then gives piece image
With one section of text, feature vector is respectively f after network exportsiAnd ft, it is L (f to define the target loss between two featuresi,ft);
Increasing regularization term prevents over-fitting, is defined as:L (W), wherein W are parameter;
So obtain associated objects loss function L=L (fi,ft)+λ L (W), wherein λ is regularization parameter.
Preferably, according to the associated objects loss function, training image depth network model and text depth network mould
Type specifically includes:
The training data for giving a batch calculates associated objects loss by propagated forward;
Gradient of the target about input data is calculated by associated objects loss function;
Gradient is successively calculated by back-propagation algorithm, and updates gradient;
It repeats the above steps and is iterated training so that after iterations reach pre-determined number, then deconditioning;
For trained network parameter, it is saved on computer disk for retrieving.
Preferably, by described image depth network model and text depth network model respectively in search library data
Image data and text data carry out feature extraction, extract the characteristics of image and text feature of corresponding depth, and the two is related
Connection ground preserves to be specifically comprised the following steps in the step of forming index data base:
A search library data are given, characteristics of image is extracted using described image depth network model for image data,
Text feature is extracted using the text depth network model for text data;
It for the characteristics of image and text feature of extraction, is preserved into index database using hash index, forms index data
Library.
Preferably, by the feature of the inquiry data extraction and corresponding text feature or figure in the index data Kuku
As feature is matched, and sorted according to matching result specifically include in the step of returning to query result it is as follows:
A given width query image extracts characteristics of image using using described image depth network model;
A query statement is given, text feature is extracted using the text depth network model;
The image for being higher than preset value to searching similarity in the characteristics of image or text feature to index data base that are extracted
Data or text data;
It is ranked up, and is finally returned that user to returning the result.
Technical solution provided by the invention has the advantages that:
In the picture and text associative search methods based on binary channels network, retrieval and inquisition data are given, can be image,
It can also be text;Moreover, after extracted feature, quickly returned by salted hash Salted relative in index database as a result, so
The similarity for calculating query characteristics and planting modes on sink characteristic afterwards is used as returned data if similarity is more than predetermined threshold value, and result is pressed
It is returned according to the sequence that is smoothed out of similarity from high to low, preferential return similarity is high as a result, improving the user experience of retrieval.
Description of the drawings
Fig. 1 is the picture and text associative search method flow diagram based on binary channels network that embodiment of the present invention provides.
Specific implementation mode
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
The description of specific distinct unless the context otherwise, the present invention in element and component, the shape that quantity both can be single
Formula exists, and form that can also be multiple exists, and the present invention is defined not to this.Although step in the present invention with label into
It has gone arrangement, but is not used to limit the precedence of step, unless expressly stated the order of step or holding for certain step
Based on row needs other steps, otherwise the relative rank of step is adjustable.It is appreciated that used herein
Term "and/or" one of is related to and covers associated Listed Items or one or more of any and all possible groups
It closes.
As shown in Figure 1, the picture and text associative search method provided by the invention based on binary channels network includes the following steps:
S1, construction training dataset, the training data concentration includes multiple pairs of image datas and text data.
Specifically, in step sl, the first step completes the acquisition of training image data.On the internet, for specific neck
The website node in domain realizes automatic download image by web crawlers, builds image data set.Internet be one by node and
The graph model of directed edge structure, wherein directed edge indicate that node is by web page files or media file table by the URL link in webpage
Show.Node is divided into as leaf node and non-leaf nodes, and leaf node can be the web page files not comprising hyperlink, can also
It is the media files such as image, video, audio;And non-leaf nodes is then the web page files comprising hyperlink.When capturing webpage,
The depth-priority-searching method of digraph can be used in web crawlers or breadth first algorithm traverses the node in network, downloads leaf
Image in child node constructs training image collection.Second step completes the generation of pairs of text data.Training image is concentrated every
Width image is each described as the sentence of one section tens or a word composition up to a hundred, generates by manually carrying out text description to it
Corresponding text data.Due to the Subjective and language description difference of people, same sub-picture is it is possible that a variety of possible
Description, therefore piece image can be carried out text description by more people, each description and correspondence image composition are a pair of, exist close between the two
Connection relationship.In practice, since amount of images is more, description process is slow, and the mode of internet " crowdsourcing " can be used to improve to image
The speed of description.
Specifically, step S1 includes the following steps:
The image data of preset quantity is obtained, and dimension normalization is to 224 × 224 pixel sizes;
Artificial text description, the language that usual description content is one section tens or a word up to a hundred is constituted are carried out to image data
Sentence;
Text description is carried out the pretreatment such as segmenting, obtains text word sequence;
Vector quantization expression is carried out to each word after participle, one section of text representation is a sequence vector for including N number of word,
N is positive integer.
For example, on the basis of completing pairs of image and text training data is built, it is also necessary to be carried out to image and text
Pretreatment.For image data, image need to be unified to scaling to the size of 224 × 224 pixels and inputted as following model.
It for text data, needs first to describe section to text to segment, and removes stop words;Word after participle is used
Word2vec vectors indicate that vector dimension is fixed value (such as 300 dimension), therefore every section of text description is finally represented as a length
Variable, the fixed sequence vector of dimension.
S2, construction carry out described image data the picture depth network model of image characteristics extraction.
In the present embodiment, according to the basic principle of depth convolutional neural networks, image characteristics extraction depth net is constructed
Network model.It includes that a batch normalizes layer, a convolutional layer that the model, which has several convolution units construction, each convolution unit,
With a ReLU layers of composition.Whole image passes through several (such as five) convolution units, and each convolution unit is followed by a size
Characteristic pattern (feature map) size reduction is original half by the maximum value pond layer for being 2 for 2, step-length.For given
Piece image, through network propagated forward calculating after, obtain d characteristic pattern form feature atlas, pass through a global pool
Each characteristic pattern is changed into a numerical value by layer, and d feature atlas is eventually changed into a d dimensional vector, should be inputted in the process
The characteristic pattern concentration characteristic pattern number of global pool layer determines that the dimension of the feature vector of final image, this feature vector are
The characteristics of image of image characteristics extraction depth network extraction.
Specifically, step S2 includes following content:
A neural network model is constructed, and the neural network model includes several convolution units and pond layer, each
Convolution unit includes that a batch normalizes layer, a convolutional layer and a nonlinear activation layer, and the neural network model
Feature is finally exported by a global pool layer.
S3, construction carry out the text data text depth network model of Text character extraction.
In the present embodiment, according to the basic principle of Recognition with Recurrent Neural Network, using door control unit to text sequence information
It is modeled, which can be that shot and long term mnemon (LSTM), gate recursive unit (GRU) or structure are simpler
M-reluGRU units, these door control units are not much different in modeling effect, but computation complexity successively decreases, it is proposed that uses
M-reluGRU units.For inputting a sequence vector data, gate mnemon cycle accepted vector is handled, and most
One d dimensional vector of output eventually, the vector are the feature of Text character extraction depth network extraction.
Specifically, step S3 includes following content:
Recognition with Recurrent Neural Network model is constructed, the Recognition with Recurrent Neural Network model includes a door control unit, and the gate
Unit cycle receives current input vector and previous moment output quantity, and after the door control unit information processing, exports one
Vector is used as text feature.
S4, the associated objects loss function for constructing described image feature and the text feature.
In the present embodiment, the associated objects loss function of image and text feature be mainly used for measure characteristics of image and
The relevance of text feature, if the two is related, loss is 0, and it is not 0 otherwise to lose.The definition of target loss function be exactly according to
Learning training is carried out to feature extraction network parameter according to mentioned above principle, keeps loss as small as possible.Specifically, assuming to give a width figure
As I and one section of text T and incidence relation s (I, T) ∈ { 0,1 } of the two, if value is 0, then it represents that the two is uncorrelated;If value
It is 1, then it represents that related.The feature vector that image and text are extracted through character pair extraction network is respectively fiAnd ft, then f is definedi
And ftLoss be L (fi,ft), concrete functional form is determined by the measuring similarity mode retrieved.Cosine is such as used to measure similar
It spends, then L (fi,ft)=cos (fi,ft), the network parameter of such object function guidance learning may learn that be more suitable for this similar
The parameter of measurement.
Over-fitting in order to prevent, loss target increase a regularization term, and 2 norm regularizations are carried out about to all parameters
Beam, the data definition are L (W)=∑k||Wk||2, wherein k expression network kth layer parameters.Final object function be loss and
Synthesis L=L (the f of regularization termi,fj)+λ L (W), wherein λ is regularization parameter.
Specifically, step S4 includes the following steps:
Setting training data concentrates feature vector of each data sample after network exports as f, then gives piece image
With one section of text, feature vector is respectively f after network exportsiAnd ft, it is L (f to define the target loss between two featuresi,ft);
Increasing regularization term prevents over-fitting, is defined as:L (W), wherein W are parameter;
So obtain associated objects loss function L=L (fi,fj)+λ L (W), wherein λ is regularization parameter.
S5, according to the associated objects loss function, training image depth network model and text depth network model.
In the present embodiment, give the training data of a batch, the training data include one group of pairs of image and
Incidence relation between text data and image-text, the feature that image, text data are obtained through corresponding network, Zhi Houji
Calculate loss.After obtaining loss, the loss of demand solution is for inputting fiAnd ftPartial derivativeWith, then according to derivative
Chain rule, the partial derivative of backwards calculation loss relatively each layer input and each layer parameter, finally according to stochastic gradient descent rule
Undated parameterWherein η is the newer learning rate of parameter, and usual numerical value is smaller, can be according to the progress such as data set
Adjustment.Finally, repeatedly above-mentioned forward calculation and backwards calculation process, undated parameter are executed.When object function no longer reduces or changes
Study is terminated when generation number reaches preset times, by after study each layer parameter of network and the storage of network basis body structure to local
Disk.
Specifically, step S5 includes the following steps:
The training data for giving a batch calculates associated objects loss by propagated forward;
Gradient of the target about input data is calculated by associated objects loss function;
Gradient is successively calculated by back-propagation algorithm, and updates gradient;
It repeats the above steps and is iterated training so that after iterations reach pre-determined number, then deconditioning;
For trained network parameter, it is saved on computer disk for retrieving.
S6, by described image depth network model and text depth network model respectively to the image in search library data
Data and text data carry out feature extraction, extract the characteristics of image and text feature of corresponding depth, and in association by the two
Preservation forms index data base.
In the present embodiment, for search library data, it is desirable to provide extraction feature simultaneously builds index, when improving retrieval
Search efficiency.For the network learnt through step S5, by the image data scaling in search library to 224 × 224 pixels
Size is sent into image characteristics extraction subnet and extracts feature, and sequence vector of the text data after participle pretreatment quantization is sent into
Text character extraction network extracts feature, and two feature extraction subnets are independently run, different from step S5.Through it is preceding to
The characteristic dimension being calculated is generally hundreds of thousands of dimensions, and to improve characteristic matching search efficiency, hash index is carried out to feature,
It is stored in index database.
Specifically, step S6 includes the following steps:
A search library data are given, characteristics of image is extracted using described image depth network model for image data,
Text feature is extracted using the text depth network model for text data;
It for the characteristics of image and text feature of extraction, is preserved into index database using hash index, forms index data
Library.
S7, extraction inquiry data feature, by it is described inquiry data extraction feature with it is right in the index data Kuku
The text feature or characteristics of image answered are matched, and are sorted according to matching result and returned to query result.
In the present embodiment, retrieval and inquisition data are given, can be image, can also be text.Through character pair
After net extraction feature, quickly returned by salted hash Salted relative in index database as a result, then calculating query characteristics and library
The similarity of feature, is used as returned data if similarity is more than predetermined threshold value, and by result according to similarity from high to low
It is smoothed out sequence to return, preferential return similarity is high as a result, improving the user experience of retrieval.
Specifically, step S7 includes the following steps:
A given width query image extracts characteristics of image using using described image depth network model;
A query statement is given, text feature is extracted using the text depth network model;
The image for being higher than preset value to searching similarity in the characteristics of image or text feature to index data base that are extracted
Data or text data;
It is ranked up, and is finally returned that user to returning the result.
Based on foregoing description, image cutting can be the image block with independent semanteme by the present invention, and can be directed to each
A image block determines corresponding vision word;It may then pass through determining vision word to be encoded, so that it is determined that each figure
As corresponding feature vector.These feature vectors may be constructed image index library, can be with when inputting target image to be retrieved
The target feature vector of target image is matched with the feature vector in image index library, so as to feed back and target figure
As relevant retrieval result.The present invention utilizes depth characteristic and clustering algorithm, can make accurate image index library, to
Improve the precision of image retrieval.
It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie
In the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter
From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power
Profit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent requirements of the claims
Variation is included within the present invention.Any reference signs in the claims should not be construed as limiting the involved claims.
In addition, it should be understood that although this specification is described in terms of embodiments, but not each embodiment is only wrapped
Containing an independent technical solution, this description of the specification is merely for the sake of clarity, and those skilled in the art should
It considers the specification as a whole, the technical solutions in the various embodiments may also be suitably combined, forms those skilled in the art
The other embodiment being appreciated that.
Claims (8)
1. a kind of picture and text associative search method based on binary channels network, it is characterised in that:Include the following steps:
Training dataset is constructed, the training data concentration includes multiple pairs of image datas and text data;
Construct the picture depth network model that described image data are carried out with image characteristics extraction;
Construct the text depth network model that Text character extraction is carried out to the text data;
Construct the associated objects loss function of described image feature and the text feature;
According to the associated objects loss function, training image depth network model and text depth network model;
By described image depth network model and text depth network model respectively in search library data image data and
Text data carries out feature extraction, extracts the characteristics of image and text feature of corresponding depth, and the two is preserved shape in association
At index data base;
The feature of extraction inquiry data, by the feature of the inquiry data extraction and corresponding text in the index data Kuku
Feature or characteristics of image are matched, and are sorted according to matching result and returned to query result.
2. a kind of picture and text associative search method based on binary channels network according to claim 1, which is characterized in that construction
Training dataset, the training data concentration includes that multiple pairs of image datas and text data specifically comprise the following steps:
The image data of preset quantity is obtained, and dimension normalization is to 224 × 224 pixel sizes;
Artificial text description, the sentence that usual description content is one section tens or a word up to a hundred is constituted are carried out to image data;
Text description is carried out the pretreatment such as segmenting, obtains text word sequence;
Vector quantization expression is carried out to each word after participle, one section of text representation is a sequence vector for including N number of word, and N is
Positive integer.
3. a kind of picture and text associative search method based on binary channels network according to claim 1, which is characterized in that construction
The picture depth network model that described image data are carried out with image characteristics extraction specifically includes:
A neural network model is constructed, and the neural network model includes several convolution units and pond layer, each convolution
Unit includes that a batch normalizes layer, a convolutional layer and a nonlinear activation layer, and the neural network model is last
Feature is exported by a global pool layer.
4. a kind of picture and text associative search method based on binary channels network according to claim 1, which is characterized in that construction
The text depth network model that Text character extraction is carried out to the text data specifically includes:
Recognition with Recurrent Neural Network model is constructed, the Recognition with Recurrent Neural Network model includes a door control unit, and the door control unit
Cycle receives current input vector and previous moment output quantity, and after the door control unit information processing, exports a vector
As text feature.
5. a kind of picture and text associative search method based on binary channels network according to claim 1, which is characterized in that construction
The associated objects loss function of characteristics of image and text feature specifically includes:
Setting training data concentrates feature vector of each data sample after network exports as f, then gives piece image and one
Duan Wenben, feature vector is respectively f after network exportsiAnd ft, it is L (f to define the target loss between two featuresi,ft);
Increasing regularization term prevents over-fitting, is defined as:L (W), wherein W are parameter;
So obtain associated objects loss function L=L (fi,ft)+λ L (W), wherein λ is regularization parameter.
6. a kind of picture and text associative search method based on binary channels network according to claim 1, which is characterized in that according to
The associated objects loss function, training image depth network model and text depth network model specifically include:
The training data for giving a batch calculates associated objects loss by propagated forward;
Gradient of the target about input data is calculated by associated objects loss function;
Gradient is successively calculated by back-propagation algorithm, and updates gradient;
It repeats the above steps and is iterated training so that after iterations reach pre-determined number, then deconditioning;
For trained network parameter, it is saved on computer disk for retrieving.
7. the picture and text associative search method according to claim 1 based on binary channels network, which is characterized in that by described
Picture depth network model and text depth network model respectively in search library data image data and text data carry out
Feature extraction, extracts the characteristics of image and text feature of corresponding depth, and the two is preserved in association and forms index data base
The step of in specifically comprise the following steps:
A search library data are given, characteristics of image is extracted using described image depth network model for image data, for
Text data extracts text feature using the text depth network model;
It for the characteristics of image and text feature of extraction, is preserved into index database using hash index, forms index data base.
8. the picture and text associative search method according to claim 1 based on binary channels network, which is characterized in that extraction inquiry
The feature of data, the feature of the inquiry data extraction and corresponding text feature in the index data Kuku or image is special
Sign is matched, and is specifically included in the step of returning to query result according to matching result sequence as follows:
A given width query image extracts characteristics of image using using described image depth network model;
A query statement is given, text feature is extracted using the text depth network model;
The image data for being higher than preset value to searching similarity in the characteristics of image or text feature to index data base that are extracted
Or text data;
It is ranked up, and is finally returned that user to returning the result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810465884.6A CN108647350A (en) | 2018-05-16 | 2018-05-16 | Image-text associated retrieval method based on two-channel network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810465884.6A CN108647350A (en) | 2018-05-16 | 2018-05-16 | Image-text associated retrieval method based on two-channel network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108647350A true CN108647350A (en) | 2018-10-12 |
Family
ID=63755876
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810465884.6A Pending CN108647350A (en) | 2018-05-16 | 2018-05-16 | Image-text associated retrieval method based on two-channel network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108647350A (en) |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109492839A (en) * | 2019-01-17 | 2019-03-19 | 东华大学 | A kind of mineral hot furnace operating condition prediction technique based on RNN-LSTM network |
CN109756778A (en) * | 2018-12-06 | 2019-05-14 | 中国人民解放军陆军工程大学 | frame rate conversion method based on self-adaptive motion compensation |
CN110222560A (en) * | 2019-04-25 | 2019-09-10 | 西北大学 | A kind of text people search's method being embedded in similitude loss function |
CN111143400A (en) * | 2019-12-26 | 2020-05-12 | 长城计算机软件与***有限公司 | Full-stack type retrieval method, system, engine and electronic equipment |
CN111241310A (en) * | 2020-01-10 | 2020-06-05 | 济南浪潮高新科技投资发展有限公司 | Deep cross-modal Hash retrieval method, equipment and medium |
CN111353076A (en) * | 2020-02-21 | 2020-06-30 | 华为技术有限公司 | Method for training cross-modal retrieval model, cross-modal retrieval method and related device |
CN111612025A (en) * | 2019-02-25 | 2020-09-01 | 北京嘀嘀无限科技发展有限公司 | Description model training method, text description device and electronic equipment |
CN111626058A (en) * | 2020-04-15 | 2020-09-04 | 井冈山大学 | Based on CR2Method and system for realizing image-text double coding of neural network |
CN111897950A (en) * | 2020-07-29 | 2020-11-06 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating information |
CN112182281A (en) * | 2019-07-05 | 2021-01-05 | 腾讯科技(深圳)有限公司 | Audio recommendation method and device and storage medium |
CN112364197A (en) * | 2020-11-12 | 2021-02-12 | 四川省人工智能研究院(宜宾) | Pedestrian image retrieval method based on text description |
CN112861882A (en) * | 2021-03-10 | 2021-05-28 | 齐鲁工业大学 | Image-text matching method and system based on frequency self-adaption |
CN112883218A (en) * | 2019-11-29 | 2021-06-01 | 智慧芽信息科技(苏州)有限公司 | Image-text combined representation searching method, system, server and storage medium |
CN113094534A (en) * | 2021-04-09 | 2021-07-09 | 陕西师范大学 | Multi-mode image-text recommendation method and device based on deep learning |
CN113127672A (en) * | 2021-04-21 | 2021-07-16 | 鹏城实验室 | Generation method, retrieval method, medium and terminal of quantized image retrieval model |
CN114120074A (en) * | 2021-11-05 | 2022-03-01 | 北京百度网讯科技有限公司 | Training method and training device of image recognition model based on semantic enhancement |
CN114896373A (en) * | 2022-07-15 | 2022-08-12 | 苏州浪潮智能科技有限公司 | Image-text mutual inspection model training method and device, image-text mutual inspection method and equipment |
CN114896438A (en) * | 2022-05-10 | 2022-08-12 | 西安电子科技大学 | Image-text retrieval method based on hierarchical alignment and generalized pooling graph attention machine mechanism |
CN115455228A (en) * | 2022-11-08 | 2022-12-09 | 苏州浪潮智能科技有限公司 | Multi-mode data mutual detection method, device, equipment and readable storage medium |
WO2023273572A1 (en) * | 2021-06-28 | 2023-01-05 | 北京有竹居网络技术有限公司 | Feature extraction model construction method and target detection method, and device therefor |
CN117932161A (en) * | 2024-03-22 | 2024-04-26 | 成都数据集团股份有限公司 | Visual search method and system for multi-source multi-mode data |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106202413A (en) * | 2016-07-11 | 2016-12-07 | 北京大学深圳研究生院 | A kind of cross-media retrieval method |
US20170024461A1 (en) * | 2015-07-23 | 2017-01-26 | International Business Machines Corporation | Context sensitive query expansion |
CN106547826A (en) * | 2016-09-30 | 2017-03-29 | 西安电子科技大学 | A kind of cross-module state search method, device and computer-readable medium |
CN107563407A (en) * | 2017-08-01 | 2018-01-09 | 同济大学 | A kind of character representation learning system of the multi-modal big data in network-oriented space |
CN107832351A (en) * | 2017-10-21 | 2018-03-23 | 桂林电子科技大学 | Cross-module state search method based on depth related network |
-
2018
- 2018-05-16 CN CN201810465884.6A patent/CN108647350A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170024461A1 (en) * | 2015-07-23 | 2017-01-26 | International Business Machines Corporation | Context sensitive query expansion |
CN106202413A (en) * | 2016-07-11 | 2016-12-07 | 北京大学深圳研究生院 | A kind of cross-media retrieval method |
CN106547826A (en) * | 2016-09-30 | 2017-03-29 | 西安电子科技大学 | A kind of cross-module state search method, device and computer-readable medium |
CN107563407A (en) * | 2017-08-01 | 2018-01-09 | 同济大学 | A kind of character representation learning system of the multi-modal big data in network-oriented space |
CN107832351A (en) * | 2017-10-21 | 2018-03-23 | 桂林电子科技大学 | Cross-module state search method based on depth related network |
Non-Patent Citations (1)
Title |
---|
赵津锋: "跨媒体检索中文本与图像信息融合研究", 《中国优秀硕士学位论文全文数据库》 * |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109756778A (en) * | 2018-12-06 | 2019-05-14 | 中国人民解放军陆军工程大学 | frame rate conversion method based on self-adaptive motion compensation |
CN109756778B (en) * | 2018-12-06 | 2021-09-14 | 中国人民解放军陆军工程大学 | Frame rate conversion method based on self-adaptive motion compensation |
CN109492839A (en) * | 2019-01-17 | 2019-03-19 | 东华大学 | A kind of mineral hot furnace operating condition prediction technique based on RNN-LSTM network |
CN111612025A (en) * | 2019-02-25 | 2020-09-01 | 北京嘀嘀无限科技发展有限公司 | Description model training method, text description device and electronic equipment |
CN111612025B (en) * | 2019-02-25 | 2023-12-12 | 北京嘀嘀无限科技发展有限公司 | Description model training method, text description device and electronic equipment |
CN110222560A (en) * | 2019-04-25 | 2019-09-10 | 西北大学 | A kind of text people search's method being embedded in similitude loss function |
CN110222560B (en) * | 2019-04-25 | 2022-12-23 | 西北大学 | Text person searching method embedded with similarity loss function |
CN112182281A (en) * | 2019-07-05 | 2021-01-05 | 腾讯科技(深圳)有限公司 | Audio recommendation method and device and storage medium |
CN112182281B (en) * | 2019-07-05 | 2023-09-19 | 腾讯科技(深圳)有限公司 | Audio recommendation method, device and storage medium |
CN112883218A (en) * | 2019-11-29 | 2021-06-01 | 智慧芽信息科技(苏州)有限公司 | Image-text combined representation searching method, system, server and storage medium |
CN111143400B (en) * | 2019-12-26 | 2024-05-14 | 新长城科技有限公司 | Full stack type retrieval method, system, engine and electronic equipment |
CN111143400A (en) * | 2019-12-26 | 2020-05-12 | 长城计算机软件与***有限公司 | Full-stack type retrieval method, system, engine and electronic equipment |
CN111241310A (en) * | 2020-01-10 | 2020-06-05 | 济南浪潮高新科技投资发展有限公司 | Deep cross-modal Hash retrieval method, equipment and medium |
CN111353076B (en) * | 2020-02-21 | 2023-10-10 | 华为云计算技术有限公司 | Method for training cross-modal retrieval model, cross-modal retrieval method and related device |
CN111353076A (en) * | 2020-02-21 | 2020-06-30 | 华为技术有限公司 | Method for training cross-modal retrieval model, cross-modal retrieval method and related device |
WO2021164772A1 (en) * | 2020-02-21 | 2021-08-26 | 华为技术有限公司 | Method for training cross-modal retrieval model, cross-modal retrieval method, and related device |
CN111626058B (en) * | 2020-04-15 | 2023-05-30 | 井冈山大学 | Based on CR 2 Image-text double-coding realization method and system of neural network |
CN111626058A (en) * | 2020-04-15 | 2020-09-04 | 井冈山大学 | Based on CR2Method and system for realizing image-text double coding of neural network |
CN111897950A (en) * | 2020-07-29 | 2020-11-06 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating information |
CN112364197A (en) * | 2020-11-12 | 2021-02-12 | 四川省人工智能研究院(宜宾) | Pedestrian image retrieval method based on text description |
CN112861882B (en) * | 2021-03-10 | 2023-05-09 | 齐鲁工业大学 | Image-text matching method and system based on frequency self-adaption |
CN112861882A (en) * | 2021-03-10 | 2021-05-28 | 齐鲁工业大学 | Image-text matching method and system based on frequency self-adaption |
CN113094534A (en) * | 2021-04-09 | 2021-07-09 | 陕西师范大学 | Multi-mode image-text recommendation method and device based on deep learning |
CN113127672A (en) * | 2021-04-21 | 2021-07-16 | 鹏城实验室 | Generation method, retrieval method, medium and terminal of quantized image retrieval model |
WO2023273572A1 (en) * | 2021-06-28 | 2023-01-05 | 北京有竹居网络技术有限公司 | Feature extraction model construction method and target detection method, and device therefor |
CN114120074A (en) * | 2021-11-05 | 2022-03-01 | 北京百度网讯科技有限公司 | Training method and training device of image recognition model based on semantic enhancement |
CN114120074B (en) * | 2021-11-05 | 2023-12-12 | 北京百度网讯科技有限公司 | Training method and training device for image recognition model based on semantic enhancement |
CN114896438A (en) * | 2022-05-10 | 2022-08-12 | 西安电子科技大学 | Image-text retrieval method based on hierarchical alignment and generalized pooling graph attention machine mechanism |
CN114896373A (en) * | 2022-07-15 | 2022-08-12 | 苏州浪潮智能科技有限公司 | Image-text mutual inspection model training method and device, image-text mutual inspection method and equipment |
CN115455228A (en) * | 2022-11-08 | 2022-12-09 | 苏州浪潮智能科技有限公司 | Multi-mode data mutual detection method, device, equipment and readable storage medium |
CN117932161A (en) * | 2024-03-22 | 2024-04-26 | 成都数据集团股份有限公司 | Visual search method and system for multi-source multi-mode data |
CN117932161B (en) * | 2024-03-22 | 2024-05-28 | 成都数据集团股份有限公司 | Visual search method and system for multi-source multi-mode data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108647350A (en) | Image-text associated retrieval method based on two-channel network | |
CN106909924B (en) | Remote sensing image rapid retrieval method based on depth significance | |
CN105912611B (en) | A kind of fast image retrieval method based on CNN | |
Kang et al. | Learning consistent feature representation for cross-modal multimedia retrieval | |
CN103778227B (en) | The method screening useful image from retrieval image | |
CN109960763B (en) | Photography community personalized friend recommendation method based on user fine-grained photography preference | |
CN107562812A (en) | A kind of cross-module state similarity-based learning method based on the modeling of modality-specific semantic space | |
CN112819023B (en) | Sample set acquisition method, device, computer equipment and storage medium | |
CN107992531A (en) | News personalization intelligent recommendation method and system based on deep learning | |
CN107220277A (en) | Image retrieval algorithm based on cartographical sketching | |
CN103942571B (en) | Graphic image sorting method based on genetic programming algorithm | |
CN113297369B (en) | Intelligent question-answering system based on knowledge graph subgraph retrieval | |
Liang et al. | Self-paced cross-modal subspace matching | |
CN107291825A (en) | With the search method and system of money commodity in a kind of video | |
CN105528437A (en) | Question-answering system construction method based on structured text knowledge extraction | |
CN107291895B (en) | Quick hierarchical document query method | |
CN114358188A (en) | Feature extraction model processing method, feature extraction model processing device, sample retrieval method, sample retrieval device and computer equipment | |
CN108388639B (en) | Cross-media retrieval method based on subspace learning and semi-supervised regularization | |
CN112988917A (en) | Entity alignment method based on multiple entity contexts | |
Wang et al. | Deep enhanced weakly-supervised hashing with iterative tag refinement | |
CN115687760A (en) | User learning interest label prediction method based on graph neural network | |
CN113380360B (en) | Similar medical record retrieval method and system based on multi-mode medical record map | |
CN108647295B (en) | Image labeling method based on depth collaborative hash | |
Le Huy et al. | Keyphrase extraction model: a new design and application on tourism information | |
CN117435685A (en) | Document retrieval method, document retrieval device, computer equipment, storage medium and product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181012 |
|
RJ01 | Rejection of invention patent application after publication |