CN108647350A - Image-text associated retrieval method based on two-channel network - Google Patents

Image-text associated retrieval method based on two-channel network Download PDF

Info

Publication number
CN108647350A
CN108647350A CN201810465884.6A CN201810465884A CN108647350A CN 108647350 A CN108647350 A CN 108647350A CN 201810465884 A CN201810465884 A CN 201810465884A CN 108647350 A CN108647350 A CN 108647350A
Authority
CN
China
Prior art keywords
text
image
data
network model
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810465884.6A
Other languages
Chinese (zh)
Inventor
王家宝
苗壮
李阳
李航
张洋硕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Army Engineering University of PLA
Original Assignee
Army Engineering University of PLA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Army Engineering University of PLA filed Critical Army Engineering University of PLA
Priority to CN201810465884.6A priority Critical patent/CN108647350A/en
Publication of CN108647350A publication Critical patent/CN108647350A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a picture and text associated retrieval method based on a two-channel network. The image-text associated retrieval method based on the two-channel network comprises the following steps: constructing a training data set; constructing an image depth network model, constructing a text depth network model, constructing a related target loss function of the image characteristics and the text characteristics, and training the image depth network model and the text depth network model; respectively extracting the features of the image data and the text data in the data of the search library through the image depth network model and the text depth network model, extracting the image features and the text features of corresponding depths, and storing the image features and the text features in a correlated manner to form an index database; extracting the characteristics of the query data, matching the extracted characteristics of the query data with the corresponding text characteristics or image characteristics in the index database, and sorting and returning the query results according to the matching results.

Description

A kind of picture and text associative search method based on binary channels network
Technical field
The invention belongs to technical field of information retrieval, more particularly to a kind of picture and text associative search based on binary channels network Method.
Background technology
Fast-developing with technologies such as " internet+", " big datas ", the mankind obtain image and the data volume of text shows How surprising growth rate finds the important research content that content needed for user is information retrieval in large-scale data. Traditional information retrieval can no longer meet current reality by text query text or by the technology of image querying image Border demand.People be more desirable to by a retrieval text obtain with the relevant image of text or text, or pass through shooting/submission One photo is retrieved and the relevant image of photo content or text.Above by text query text or image query image Method belongs to the inquiry of single mode data, and how to carry out information search in multiple modalities or carry out cross-module state information search getting over More to obtain the attention of people.It needs to solve different modalities with image querying text or with the cross-module state retrieval of text query image The feature unified representation problem of data, since there are greatest differences in data format for text data and image data, such as What unified representation different modalities data becomes multi-modal or the retrieval of cross-module state key problem.
In order to solve this key problem, method that conventional multi-mode state or cross-module state data indicate is typically by by image Data obtain semantic text through semantic understanding, then are inquired by semantic text, but such method is by the essence of semantic understanding Degree limitation without being developed very well.In recent years, with the development of the development of deep learning, especially convolutional neural networks, Significantly more efficient character representation can be extracted by operations such as convolution, ponds, the semantic understanding and character representation of image are showed Go out extraordinary characteristic.Meanwhile the neural network language model of text data has also obtained fabulous development, wherein cycle nerve Network shows powerful long-term memory ability to the modeling of sequence data because of door control units such as shot and long term memories, can be used for Text data is modeled.Character representation and Recognition with Recurrent Neural Network clock synchronization ordinal number of the above-mentioned convolutional neural networks to image According to character representation, same characterization ability can be reached, but how to combine convolutional network and Recognition with Recurrent Neural Network, The consistent expression of common study different modalities data is still the Important Problems of limit multimode state and cross-module state information retrieval.
Invention content
It is an object of the invention in view of the drawbacks of the prior art or problem, provide a kind of picture and text based on binary channels network Associative search method.
Technical scheme is as follows:A kind of picture and text associative search method based on binary channels network includes following step Suddenly:Training dataset is constructed, the training data concentration includes multiple pairs of image datas and text data;Construction is to described Image data carries out the picture depth network model of image characteristics extraction, and construction carries out Text character extraction to the text data Text depth network model, and the associated objects loss function of described image feature and the text feature is constructed, according to institute State associated objects loss function, training image depth network model and text depth network model;Pass through described image depth net Network model and text depth network model respectively in search library data image data and text data carry out feature extraction, carry The characteristics of image and text feature of corresponding depth are taken, and the two is preserved in association and forms index data base;Extraction inquiry number According to feature, by it is described inquiry data extraction feature and corresponding text feature or characteristics of image in the index data Kuku It is matched, and is sorted according to matching result and return to query result.
Preferably, training dataset is constructed, the training data concentration includes multiple pairs of image data and textual data According to specifically comprising the following steps:
The image data of preset quantity is obtained, and dimension normalization is to 224 × 224 pixel sizes;
Artificial text description, the language that usual description content is one section tens or a word up to a hundred is constituted are carried out to image data Sentence;
Text description is carried out the pretreatment such as segmenting, obtains text word sequence;
Vector quantization expression is carried out to each word after participle, one section of text representation is a sequence vector for including N number of word, N is positive integer.
Preferably, construction specifically includes the picture depth network model of described image data progress image characteristics extraction:
A neural network model is constructed, and the neural network model includes several convolution units and pond layer, each Convolution unit includes that a batch normalizes layer, a convolutional layer and a nonlinear activation layer, and the neural network model Feature is finally exported by a global pool layer.
Preferably, construction specifically includes the text depth network model of text data progress Text character extraction:
Recognition with Recurrent Neural Network model is constructed, the Recognition with Recurrent Neural Network model includes a door control unit, and the gate Unit cycle receives current input vector and previous moment output quantity, and after the door control unit information processing, exports one Vector is used as text feature.
Preferably, it constructs characteristics of image and the associated objects loss function of text feature specifically includes:
Setting training data concentrates feature vector of each data sample after network exports as f, then gives piece image With one section of text, feature vector is respectively f after network exportsiAnd ft, it is L (f to define the target loss between two featuresi,ft);
Increasing regularization term prevents over-fitting, is defined as:L (W), wherein W are parameter;
So obtain associated objects loss function L=L (fi,ft)+λ L (W), wherein λ is regularization parameter.
Preferably, according to the associated objects loss function, training image depth network model and text depth network mould Type specifically includes:
The training data for giving a batch calculates associated objects loss by propagated forward;
Gradient of the target about input data is calculated by associated objects loss function;
Gradient is successively calculated by back-propagation algorithm, and updates gradient;
It repeats the above steps and is iterated training so that after iterations reach pre-determined number, then deconditioning;
For trained network parameter, it is saved on computer disk for retrieving.
Preferably, by described image depth network model and text depth network model respectively in search library data Image data and text data carry out feature extraction, extract the characteristics of image and text feature of corresponding depth, and the two is related Connection ground preserves to be specifically comprised the following steps in the step of forming index data base:
A search library data are given, characteristics of image is extracted using described image depth network model for image data, Text feature is extracted using the text depth network model for text data;
It for the characteristics of image and text feature of extraction, is preserved into index database using hash index, forms index data Library.
Preferably, by the feature of the inquiry data extraction and corresponding text feature or figure in the index data Kuku As feature is matched, and sorted according to matching result specifically include in the step of returning to query result it is as follows:
A given width query image extracts characteristics of image using using described image depth network model;
A query statement is given, text feature is extracted using the text depth network model;
The image for being higher than preset value to searching similarity in the characteristics of image or text feature to index data base that are extracted Data or text data;
It is ranked up, and is finally returned that user to returning the result.
Technical solution provided by the invention has the advantages that:
In the picture and text associative search methods based on binary channels network, retrieval and inquisition data are given, can be image, It can also be text;Moreover, after extracted feature, quickly returned by salted hash Salted relative in index database as a result, so The similarity for calculating query characteristics and planting modes on sink characteristic afterwards is used as returned data if similarity is more than predetermined threshold value, and result is pressed It is returned according to the sequence that is smoothed out of similarity from high to low, preferential return similarity is high as a result, improving the user experience of retrieval.
Description of the drawings
Fig. 1 is the picture and text associative search method flow diagram based on binary channels network that embodiment of the present invention provides.
Specific implementation mode
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.
The description of specific distinct unless the context otherwise, the present invention in element and component, the shape that quantity both can be single Formula exists, and form that can also be multiple exists, and the present invention is defined not to this.Although step in the present invention with label into It has gone arrangement, but is not used to limit the precedence of step, unless expressly stated the order of step or holding for certain step Based on row needs other steps, otherwise the relative rank of step is adjustable.It is appreciated that used herein Term "and/or" one of is related to and covers associated Listed Items or one or more of any and all possible groups It closes.
As shown in Figure 1, the picture and text associative search method provided by the invention based on binary channels network includes the following steps:
S1, construction training dataset, the training data concentration includes multiple pairs of image datas and text data.
Specifically, in step sl, the first step completes the acquisition of training image data.On the internet, for specific neck The website node in domain realizes automatic download image by web crawlers, builds image data set.Internet be one by node and The graph model of directed edge structure, wherein directed edge indicate that node is by web page files or media file table by the URL link in webpage Show.Node is divided into as leaf node and non-leaf nodes, and leaf node can be the web page files not comprising hyperlink, can also It is the media files such as image, video, audio;And non-leaf nodes is then the web page files comprising hyperlink.When capturing webpage, The depth-priority-searching method of digraph can be used in web crawlers or breadth first algorithm traverses the node in network, downloads leaf Image in child node constructs training image collection.Second step completes the generation of pairs of text data.Training image is concentrated every Width image is each described as the sentence of one section tens or a word composition up to a hundred, generates by manually carrying out text description to it Corresponding text data.Due to the Subjective and language description difference of people, same sub-picture is it is possible that a variety of possible Description, therefore piece image can be carried out text description by more people, each description and correspondence image composition are a pair of, exist close between the two Connection relationship.In practice, since amount of images is more, description process is slow, and the mode of internet " crowdsourcing " can be used to improve to image The speed of description.
Specifically, step S1 includes the following steps:
The image data of preset quantity is obtained, and dimension normalization is to 224 × 224 pixel sizes;
Artificial text description, the language that usual description content is one section tens or a word up to a hundred is constituted are carried out to image data Sentence;
Text description is carried out the pretreatment such as segmenting, obtains text word sequence;
Vector quantization expression is carried out to each word after participle, one section of text representation is a sequence vector for including N number of word, N is positive integer.
For example, on the basis of completing pairs of image and text training data is built, it is also necessary to be carried out to image and text Pretreatment.For image data, image need to be unified to scaling to the size of 224 × 224 pixels and inputted as following model. It for text data, needs first to describe section to text to segment, and removes stop words;Word after participle is used Word2vec vectors indicate that vector dimension is fixed value (such as 300 dimension), therefore every section of text description is finally represented as a length Variable, the fixed sequence vector of dimension.
S2, construction carry out described image data the picture depth network model of image characteristics extraction.
In the present embodiment, according to the basic principle of depth convolutional neural networks, image characteristics extraction depth net is constructed Network model.It includes that a batch normalizes layer, a convolutional layer that the model, which has several convolution units construction, each convolution unit, With a ReLU layers of composition.Whole image passes through several (such as five) convolution units, and each convolution unit is followed by a size Characteristic pattern (feature map) size reduction is original half by the maximum value pond layer for being 2 for 2, step-length.For given Piece image, through network propagated forward calculating after, obtain d characteristic pattern form feature atlas, pass through a global pool Each characteristic pattern is changed into a numerical value by layer, and d feature atlas is eventually changed into a d dimensional vector, should be inputted in the process The characteristic pattern concentration characteristic pattern number of global pool layer determines that the dimension of the feature vector of final image, this feature vector are The characteristics of image of image characteristics extraction depth network extraction.
Specifically, step S2 includes following content:
A neural network model is constructed, and the neural network model includes several convolution units and pond layer, each Convolution unit includes that a batch normalizes layer, a convolutional layer and a nonlinear activation layer, and the neural network model Feature is finally exported by a global pool layer.
S3, construction carry out the text data text depth network model of Text character extraction.
In the present embodiment, according to the basic principle of Recognition with Recurrent Neural Network, using door control unit to text sequence information It is modeled, which can be that shot and long term mnemon (LSTM), gate recursive unit (GRU) or structure are simpler M-reluGRU units, these door control units are not much different in modeling effect, but computation complexity successively decreases, it is proposed that uses M-reluGRU units.For inputting a sequence vector data, gate mnemon cycle accepted vector is handled, and most One d dimensional vector of output eventually, the vector are the feature of Text character extraction depth network extraction.
Specifically, step S3 includes following content:
Recognition with Recurrent Neural Network model is constructed, the Recognition with Recurrent Neural Network model includes a door control unit, and the gate Unit cycle receives current input vector and previous moment output quantity, and after the door control unit information processing, exports one Vector is used as text feature.
S4, the associated objects loss function for constructing described image feature and the text feature.
In the present embodiment, the associated objects loss function of image and text feature be mainly used for measure characteristics of image and The relevance of text feature, if the two is related, loss is 0, and it is not 0 otherwise to lose.The definition of target loss function be exactly according to Learning training is carried out to feature extraction network parameter according to mentioned above principle, keeps loss as small as possible.Specifically, assuming to give a width figure As I and one section of text T and incidence relation s (I, T) ∈ { 0,1 } of the two, if value is 0, then it represents that the two is uncorrelated;If value It is 1, then it represents that related.The feature vector that image and text are extracted through character pair extraction network is respectively fiAnd ft, then f is definedi And ftLoss be L (fi,ft), concrete functional form is determined by the measuring similarity mode retrieved.Cosine is such as used to measure similar It spends, then L (fi,ft)=cos (fi,ft), the network parameter of such object function guidance learning may learn that be more suitable for this similar The parameter of measurement.
Over-fitting in order to prevent, loss target increase a regularization term, and 2 norm regularizations are carried out about to all parameters Beam, the data definition are L (W)=∑k||Wk||2, wherein k expression network kth layer parameters.Final object function be loss and Synthesis L=L (the f of regularization termi,fj)+λ L (W), wherein λ is regularization parameter.
Specifically, step S4 includes the following steps:
Setting training data concentrates feature vector of each data sample after network exports as f, then gives piece image With one section of text, feature vector is respectively f after network exportsiAnd ft, it is L (f to define the target loss between two featuresi,ft);
Increasing regularization term prevents over-fitting, is defined as:L (W), wherein W are parameter;
So obtain associated objects loss function L=L (fi,fj)+λ L (W), wherein λ is regularization parameter.
S5, according to the associated objects loss function, training image depth network model and text depth network model.
In the present embodiment, give the training data of a batch, the training data include one group of pairs of image and Incidence relation between text data and image-text, the feature that image, text data are obtained through corresponding network, Zhi Houji Calculate loss.After obtaining loss, the loss of demand solution is for inputting fiAnd ftPartial derivativeWith, then according to derivative Chain rule, the partial derivative of backwards calculation loss relatively each layer input and each layer parameter, finally according to stochastic gradient descent rule Undated parameterWherein η is the newer learning rate of parameter, and usual numerical value is smaller, can be according to the progress such as data set Adjustment.Finally, repeatedly above-mentioned forward calculation and backwards calculation process, undated parameter are executed.When object function no longer reduces or changes Study is terminated when generation number reaches preset times, by after study each layer parameter of network and the storage of network basis body structure to local Disk.
Specifically, step S5 includes the following steps:
The training data for giving a batch calculates associated objects loss by propagated forward;
Gradient of the target about input data is calculated by associated objects loss function;
Gradient is successively calculated by back-propagation algorithm, and updates gradient;
It repeats the above steps and is iterated training so that after iterations reach pre-determined number, then deconditioning;
For trained network parameter, it is saved on computer disk for retrieving.
S6, by described image depth network model and text depth network model respectively to the image in search library data Data and text data carry out feature extraction, extract the characteristics of image and text feature of corresponding depth, and in association by the two Preservation forms index data base.
In the present embodiment, for search library data, it is desirable to provide extraction feature simultaneously builds index, when improving retrieval Search efficiency.For the network learnt through step S5, by the image data scaling in search library to 224 × 224 pixels Size is sent into image characteristics extraction subnet and extracts feature, and sequence vector of the text data after participle pretreatment quantization is sent into Text character extraction network extracts feature, and two feature extraction subnets are independently run, different from step S5.Through it is preceding to The characteristic dimension being calculated is generally hundreds of thousands of dimensions, and to improve characteristic matching search efficiency, hash index is carried out to feature, It is stored in index database.
Specifically, step S6 includes the following steps:
A search library data are given, characteristics of image is extracted using described image depth network model for image data, Text feature is extracted using the text depth network model for text data;
It for the characteristics of image and text feature of extraction, is preserved into index database using hash index, forms index data Library.
S7, extraction inquiry data feature, by it is described inquiry data extraction feature with it is right in the index data Kuku The text feature or characteristics of image answered are matched, and are sorted according to matching result and returned to query result.
In the present embodiment, retrieval and inquisition data are given, can be image, can also be text.Through character pair After net extraction feature, quickly returned by salted hash Salted relative in index database as a result, then calculating query characteristics and library The similarity of feature, is used as returned data if similarity is more than predetermined threshold value, and by result according to similarity from high to low It is smoothed out sequence to return, preferential return similarity is high as a result, improving the user experience of retrieval.
Specifically, step S7 includes the following steps:
A given width query image extracts characteristics of image using using described image depth network model;
A query statement is given, text feature is extracted using the text depth network model;
The image for being higher than preset value to searching similarity in the characteristics of image or text feature to index data base that are extracted Data or text data;
It is ranked up, and is finally returned that user to returning the result.
Based on foregoing description, image cutting can be the image block with independent semanteme by the present invention, and can be directed to each A image block determines corresponding vision word;It may then pass through determining vision word to be encoded, so that it is determined that each figure As corresponding feature vector.These feature vectors may be constructed image index library, can be with when inputting target image to be retrieved The target feature vector of target image is matched with the feature vector in image index library, so as to feed back and target figure As relevant retrieval result.The present invention utilizes depth characteristic and clustering algorithm, can make accurate image index library, to Improve the precision of image retrieval.
It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie In the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power Profit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent requirements of the claims Variation is included within the present invention.Any reference signs in the claims should not be construed as limiting the involved claims.
In addition, it should be understood that although this specification is described in terms of embodiments, but not each embodiment is only wrapped Containing an independent technical solution, this description of the specification is merely for the sake of clarity, and those skilled in the art should It considers the specification as a whole, the technical solutions in the various embodiments may also be suitably combined, forms those skilled in the art The other embodiment being appreciated that.

Claims (8)

1. a kind of picture and text associative search method based on binary channels network, it is characterised in that:Include the following steps:
Training dataset is constructed, the training data concentration includes multiple pairs of image datas and text data;
Construct the picture depth network model that described image data are carried out with image characteristics extraction;
Construct the text depth network model that Text character extraction is carried out to the text data;
Construct the associated objects loss function of described image feature and the text feature;
According to the associated objects loss function, training image depth network model and text depth network model;
By described image depth network model and text depth network model respectively in search library data image data and Text data carries out feature extraction, extracts the characteristics of image and text feature of corresponding depth, and the two is preserved shape in association At index data base;
The feature of extraction inquiry data, by the feature of the inquiry data extraction and corresponding text in the index data Kuku Feature or characteristics of image are matched, and are sorted according to matching result and returned to query result.
2. a kind of picture and text associative search method based on binary channels network according to claim 1, which is characterized in that construction Training dataset, the training data concentration includes that multiple pairs of image datas and text data specifically comprise the following steps:
The image data of preset quantity is obtained, and dimension normalization is to 224 × 224 pixel sizes;
Artificial text description, the sentence that usual description content is one section tens or a word up to a hundred is constituted are carried out to image data;
Text description is carried out the pretreatment such as segmenting, obtains text word sequence;
Vector quantization expression is carried out to each word after participle, one section of text representation is a sequence vector for including N number of word, and N is Positive integer.
3. a kind of picture and text associative search method based on binary channels network according to claim 1, which is characterized in that construction The picture depth network model that described image data are carried out with image characteristics extraction specifically includes:
A neural network model is constructed, and the neural network model includes several convolution units and pond layer, each convolution Unit includes that a batch normalizes layer, a convolutional layer and a nonlinear activation layer, and the neural network model is last Feature is exported by a global pool layer.
4. a kind of picture and text associative search method based on binary channels network according to claim 1, which is characterized in that construction The text depth network model that Text character extraction is carried out to the text data specifically includes:
Recognition with Recurrent Neural Network model is constructed, the Recognition with Recurrent Neural Network model includes a door control unit, and the door control unit Cycle receives current input vector and previous moment output quantity, and after the door control unit information processing, exports a vector As text feature.
5. a kind of picture and text associative search method based on binary channels network according to claim 1, which is characterized in that construction The associated objects loss function of characteristics of image and text feature specifically includes:
Setting training data concentrates feature vector of each data sample after network exports as f, then gives piece image and one Duan Wenben, feature vector is respectively f after network exportsiAnd ft, it is L (f to define the target loss between two featuresi,ft);
Increasing regularization term prevents over-fitting, is defined as:L (W), wherein W are parameter;
So obtain associated objects loss function L=L (fi,ft)+λ L (W), wherein λ is regularization parameter.
6. a kind of picture and text associative search method based on binary channels network according to claim 1, which is characterized in that according to The associated objects loss function, training image depth network model and text depth network model specifically include:
The training data for giving a batch calculates associated objects loss by propagated forward;
Gradient of the target about input data is calculated by associated objects loss function;
Gradient is successively calculated by back-propagation algorithm, and updates gradient;
It repeats the above steps and is iterated training so that after iterations reach pre-determined number, then deconditioning;
For trained network parameter, it is saved on computer disk for retrieving.
7. the picture and text associative search method according to claim 1 based on binary channels network, which is characterized in that by described Picture depth network model and text depth network model respectively in search library data image data and text data carry out Feature extraction, extracts the characteristics of image and text feature of corresponding depth, and the two is preserved in association and forms index data base The step of in specifically comprise the following steps:
A search library data are given, characteristics of image is extracted using described image depth network model for image data, for Text data extracts text feature using the text depth network model;
It for the characteristics of image and text feature of extraction, is preserved into index database using hash index, forms index data base.
8. the picture and text associative search method according to claim 1 based on binary channels network, which is characterized in that extraction inquiry The feature of data, the feature of the inquiry data extraction and corresponding text feature in the index data Kuku or image is special Sign is matched, and is specifically included in the step of returning to query result according to matching result sequence as follows:
A given width query image extracts characteristics of image using using described image depth network model;
A query statement is given, text feature is extracted using the text depth network model;
The image data for being higher than preset value to searching similarity in the characteristics of image or text feature to index data base that are extracted Or text data;
It is ranked up, and is finally returned that user to returning the result.
CN201810465884.6A 2018-05-16 2018-05-16 Image-text associated retrieval method based on two-channel network Pending CN108647350A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810465884.6A CN108647350A (en) 2018-05-16 2018-05-16 Image-text associated retrieval method based on two-channel network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810465884.6A CN108647350A (en) 2018-05-16 2018-05-16 Image-text associated retrieval method based on two-channel network

Publications (1)

Publication Number Publication Date
CN108647350A true CN108647350A (en) 2018-10-12

Family

ID=63755876

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810465884.6A Pending CN108647350A (en) 2018-05-16 2018-05-16 Image-text associated retrieval method based on two-channel network

Country Status (1)

Country Link
CN (1) CN108647350A (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492839A (en) * 2019-01-17 2019-03-19 东华大学 A kind of mineral hot furnace operating condition prediction technique based on RNN-LSTM network
CN109756778A (en) * 2018-12-06 2019-05-14 中国人民解放军陆军工程大学 frame rate conversion method based on self-adaptive motion compensation
CN110222560A (en) * 2019-04-25 2019-09-10 西北大学 A kind of text people search's method being embedded in similitude loss function
CN111143400A (en) * 2019-12-26 2020-05-12 长城计算机软件与***有限公司 Full-stack type retrieval method, system, engine and electronic equipment
CN111241310A (en) * 2020-01-10 2020-06-05 济南浪潮高新科技投资发展有限公司 Deep cross-modal Hash retrieval method, equipment and medium
CN111353076A (en) * 2020-02-21 2020-06-30 华为技术有限公司 Method for training cross-modal retrieval model, cross-modal retrieval method and related device
CN111612025A (en) * 2019-02-25 2020-09-01 北京嘀嘀无限科技发展有限公司 Description model training method, text description device and electronic equipment
CN111626058A (en) * 2020-04-15 2020-09-04 井冈山大学 Based on CR2Method and system for realizing image-text double coding of neural network
CN111897950A (en) * 2020-07-29 2020-11-06 北京字节跳动网络技术有限公司 Method and apparatus for generating information
CN112182281A (en) * 2019-07-05 2021-01-05 腾讯科技(深圳)有限公司 Audio recommendation method and device and storage medium
CN112364197A (en) * 2020-11-12 2021-02-12 四川省人工智能研究院(宜宾) Pedestrian image retrieval method based on text description
CN112861882A (en) * 2021-03-10 2021-05-28 齐鲁工业大学 Image-text matching method and system based on frequency self-adaption
CN112883218A (en) * 2019-11-29 2021-06-01 智慧芽信息科技(苏州)有限公司 Image-text combined representation searching method, system, server and storage medium
CN113094534A (en) * 2021-04-09 2021-07-09 陕西师范大学 Multi-mode image-text recommendation method and device based on deep learning
CN113127672A (en) * 2021-04-21 2021-07-16 鹏城实验室 Generation method, retrieval method, medium and terminal of quantized image retrieval model
CN114120074A (en) * 2021-11-05 2022-03-01 北京百度网讯科技有限公司 Training method and training device of image recognition model based on semantic enhancement
CN114896373A (en) * 2022-07-15 2022-08-12 苏州浪潮智能科技有限公司 Image-text mutual inspection model training method and device, image-text mutual inspection method and equipment
CN114896438A (en) * 2022-05-10 2022-08-12 西安电子科技大学 Image-text retrieval method based on hierarchical alignment and generalized pooling graph attention machine mechanism
CN115455228A (en) * 2022-11-08 2022-12-09 苏州浪潮智能科技有限公司 Multi-mode data mutual detection method, device, equipment and readable storage medium
WO2023273572A1 (en) * 2021-06-28 2023-01-05 北京有竹居网络技术有限公司 Feature extraction model construction method and target detection method, and device therefor
CN117932161A (en) * 2024-03-22 2024-04-26 成都数据集团股份有限公司 Visual search method and system for multi-source multi-mode data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202413A (en) * 2016-07-11 2016-12-07 北京大学深圳研究生院 A kind of cross-media retrieval method
US20170024461A1 (en) * 2015-07-23 2017-01-26 International Business Machines Corporation Context sensitive query expansion
CN106547826A (en) * 2016-09-30 2017-03-29 西安电子科技大学 A kind of cross-module state search method, device and computer-readable medium
CN107563407A (en) * 2017-08-01 2018-01-09 同济大学 A kind of character representation learning system of the multi-modal big data in network-oriented space
CN107832351A (en) * 2017-10-21 2018-03-23 桂林电子科技大学 Cross-module state search method based on depth related network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170024461A1 (en) * 2015-07-23 2017-01-26 International Business Machines Corporation Context sensitive query expansion
CN106202413A (en) * 2016-07-11 2016-12-07 北京大学深圳研究生院 A kind of cross-media retrieval method
CN106547826A (en) * 2016-09-30 2017-03-29 西安电子科技大学 A kind of cross-module state search method, device and computer-readable medium
CN107563407A (en) * 2017-08-01 2018-01-09 同济大学 A kind of character representation learning system of the multi-modal big data in network-oriented space
CN107832351A (en) * 2017-10-21 2018-03-23 桂林电子科技大学 Cross-module state search method based on depth related network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵津锋: "跨媒体检索中文本与图像信息融合研究", 《中国优秀硕士学位论文全文数据库》 *

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109756778A (en) * 2018-12-06 2019-05-14 中国人民解放军陆军工程大学 frame rate conversion method based on self-adaptive motion compensation
CN109756778B (en) * 2018-12-06 2021-09-14 中国人民解放军陆军工程大学 Frame rate conversion method based on self-adaptive motion compensation
CN109492839A (en) * 2019-01-17 2019-03-19 东华大学 A kind of mineral hot furnace operating condition prediction technique based on RNN-LSTM network
CN111612025A (en) * 2019-02-25 2020-09-01 北京嘀嘀无限科技发展有限公司 Description model training method, text description device and electronic equipment
CN111612025B (en) * 2019-02-25 2023-12-12 北京嘀嘀无限科技发展有限公司 Description model training method, text description device and electronic equipment
CN110222560A (en) * 2019-04-25 2019-09-10 西北大学 A kind of text people search's method being embedded in similitude loss function
CN110222560B (en) * 2019-04-25 2022-12-23 西北大学 Text person searching method embedded with similarity loss function
CN112182281A (en) * 2019-07-05 2021-01-05 腾讯科技(深圳)有限公司 Audio recommendation method and device and storage medium
CN112182281B (en) * 2019-07-05 2023-09-19 腾讯科技(深圳)有限公司 Audio recommendation method, device and storage medium
CN112883218A (en) * 2019-11-29 2021-06-01 智慧芽信息科技(苏州)有限公司 Image-text combined representation searching method, system, server and storage medium
CN111143400B (en) * 2019-12-26 2024-05-14 新长城科技有限公司 Full stack type retrieval method, system, engine and electronic equipment
CN111143400A (en) * 2019-12-26 2020-05-12 长城计算机软件与***有限公司 Full-stack type retrieval method, system, engine and electronic equipment
CN111241310A (en) * 2020-01-10 2020-06-05 济南浪潮高新科技投资发展有限公司 Deep cross-modal Hash retrieval method, equipment and medium
CN111353076B (en) * 2020-02-21 2023-10-10 华为云计算技术有限公司 Method for training cross-modal retrieval model, cross-modal retrieval method and related device
CN111353076A (en) * 2020-02-21 2020-06-30 华为技术有限公司 Method for training cross-modal retrieval model, cross-modal retrieval method and related device
WO2021164772A1 (en) * 2020-02-21 2021-08-26 华为技术有限公司 Method for training cross-modal retrieval model, cross-modal retrieval method, and related device
CN111626058B (en) * 2020-04-15 2023-05-30 井冈山大学 Based on CR 2 Image-text double-coding realization method and system of neural network
CN111626058A (en) * 2020-04-15 2020-09-04 井冈山大学 Based on CR2Method and system for realizing image-text double coding of neural network
CN111897950A (en) * 2020-07-29 2020-11-06 北京字节跳动网络技术有限公司 Method and apparatus for generating information
CN112364197A (en) * 2020-11-12 2021-02-12 四川省人工智能研究院(宜宾) Pedestrian image retrieval method based on text description
CN112861882B (en) * 2021-03-10 2023-05-09 齐鲁工业大学 Image-text matching method and system based on frequency self-adaption
CN112861882A (en) * 2021-03-10 2021-05-28 齐鲁工业大学 Image-text matching method and system based on frequency self-adaption
CN113094534A (en) * 2021-04-09 2021-07-09 陕西师范大学 Multi-mode image-text recommendation method and device based on deep learning
CN113127672A (en) * 2021-04-21 2021-07-16 鹏城实验室 Generation method, retrieval method, medium and terminal of quantized image retrieval model
WO2023273572A1 (en) * 2021-06-28 2023-01-05 北京有竹居网络技术有限公司 Feature extraction model construction method and target detection method, and device therefor
CN114120074A (en) * 2021-11-05 2022-03-01 北京百度网讯科技有限公司 Training method and training device of image recognition model based on semantic enhancement
CN114120074B (en) * 2021-11-05 2023-12-12 北京百度网讯科技有限公司 Training method and training device for image recognition model based on semantic enhancement
CN114896438A (en) * 2022-05-10 2022-08-12 西安电子科技大学 Image-text retrieval method based on hierarchical alignment and generalized pooling graph attention machine mechanism
CN114896373A (en) * 2022-07-15 2022-08-12 苏州浪潮智能科技有限公司 Image-text mutual inspection model training method and device, image-text mutual inspection method and equipment
CN115455228A (en) * 2022-11-08 2022-12-09 苏州浪潮智能科技有限公司 Multi-mode data mutual detection method, device, equipment and readable storage medium
CN117932161A (en) * 2024-03-22 2024-04-26 成都数据集团股份有限公司 Visual search method and system for multi-source multi-mode data
CN117932161B (en) * 2024-03-22 2024-05-28 成都数据集团股份有限公司 Visual search method and system for multi-source multi-mode data

Similar Documents

Publication Publication Date Title
CN108647350A (en) Image-text associated retrieval method based on two-channel network
CN106909924B (en) Remote sensing image rapid retrieval method based on depth significance
CN105912611B (en) A kind of fast image retrieval method based on CNN
Kang et al. Learning consistent feature representation for cross-modal multimedia retrieval
CN103778227B (en) The method screening useful image from retrieval image
CN109960763B (en) Photography community personalized friend recommendation method based on user fine-grained photography preference
CN107562812A (en) A kind of cross-module state similarity-based learning method based on the modeling of modality-specific semantic space
CN112819023B (en) Sample set acquisition method, device, computer equipment and storage medium
CN107992531A (en) News personalization intelligent recommendation method and system based on deep learning
CN107220277A (en) Image retrieval algorithm based on cartographical sketching
CN103942571B (en) Graphic image sorting method based on genetic programming algorithm
CN113297369B (en) Intelligent question-answering system based on knowledge graph subgraph retrieval
Liang et al. Self-paced cross-modal subspace matching
CN107291825A (en) With the search method and system of money commodity in a kind of video
CN105528437A (en) Question-answering system construction method based on structured text knowledge extraction
CN107291895B (en) Quick hierarchical document query method
CN114358188A (en) Feature extraction model processing method, feature extraction model processing device, sample retrieval method, sample retrieval device and computer equipment
CN108388639B (en) Cross-media retrieval method based on subspace learning and semi-supervised regularization
CN112988917A (en) Entity alignment method based on multiple entity contexts
Wang et al. Deep enhanced weakly-supervised hashing with iterative tag refinement
CN115687760A (en) User learning interest label prediction method based on graph neural network
CN113380360B (en) Similar medical record retrieval method and system based on multi-mode medical record map
CN108647295B (en) Image labeling method based on depth collaborative hash
Le Huy et al. Keyphrase extraction model: a new design and application on tourism information
CN117435685A (en) Document retrieval method, document retrieval device, computer equipment, storage medium and product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181012

RJ01 Rejection of invention patent application after publication