CN111782853B - Semantic image retrieval method based on attention mechanism - Google Patents

Semantic image retrieval method based on attention mechanism Download PDF

Info

Publication number
CN111782853B
CN111782853B CN202010582273.7A CN202010582273A CN111782853B CN 111782853 B CN111782853 B CN 111782853B CN 202010582273 A CN202010582273 A CN 202010582273A CN 111782853 B CN111782853 B CN 111782853B
Authority
CN
China
Prior art keywords
vector
pictures
semantic feature
semantic
feature vectors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010582273.7A
Other languages
Chinese (zh)
Other versions
CN111782853A (en
Inventor
韩红
杨慎全
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202010582273.7A priority Critical patent/CN111782853B/en
Publication of CN111782853A publication Critical patent/CN111782853A/en
Application granted granted Critical
Publication of CN111782853B publication Critical patent/CN111782853B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Library & Information Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a semantic image retrieval method based on an attention mechanism, which mainly solves the problem that a semantic gap influences retrieval accuracy in an image retrieval process. The method comprises the following implementation steps: 1) Constructing a CNN-RNN network model containing an attention mechanism and training; 2) Extracting the text characteristics of the pictures in the image library by using the trained network model; 3) Extracting semantic feature vectors of text features by using a text vector doc2vec model and storing the semantic feature vectors; 4) Extracting text features of the query picture by using the trained network model, and extracting a semantic feature vector corresponding to the text features; 5) And calculating and comparing the characteristic vector of the query picture with the characteristic vector in the image library by using a cosine method, and outputting a result. The method can effectively reduce the influence caused by the semantic gap, so that the system can perform similarity retrieval on the semantic information expressed by the picture, and can be used for quick retrieval of mass data in the internet and search of mobile phone photos in daily life.

Description

Semantic image retrieval method based on attention mechanism
Technical Field
The invention belongs to the technical field of image processing, and further relates to an image-based pattern recognition technology, in particular to a semantic image retrieval method based on an attention mechanism. In the image retrieval process, for a query picture (query image), images similar to the query picture in the image library are obtained by searching and output.
Background
Image retrieval refers to giving an image containing specific content and then finding images containing similar content in an image database, but because different images are greatly different under the influence of shooting angles, shading, lighting and other factors, it is a challenging topic to find a desired image quickly under the influence of the uncontrollable factors. In the network era today, a huge amount of images are uploaded to a server every moment on a network, and particularly with the rise of social networks, for example, nearly 60 hundred million pictures are stored in a server in Tencent, and the pictures contain very rich information, so how to exert the advantages of a computer in processing huge amount of image data, and quickly and accurately find out the pictures which are interested by a user for retrieval has great value and practical significance, and more scientific researchers are invested in the field.
Most of conventional image retrieval methods adopt models such as Histogram of Oriented Gradient (HOG), scale-invariant feature transform (SIFT) and the like to extract feature vectors of an image, and then output similar images by calculating distances of the feature vectors, but the above models are easily affected by noise, and have slow calculation speed and low retrieval accuracy, so that a new research method is urgently needed to be developed.
In recent years, with the great heat of deep learning research, the convolutional neural network CNN has become a research hotspot in the field of current speech analysis and image recognition, and the structures of weight sharing, receptive field and the like of the convolutional neural network CNN make the convolutional neural network occupy a domination position in the field of images and make the images directly serve as the input of the network, thereby avoiding the defects of large calculation amount and low speed of the traditional image retrieval algorithm.
Because of The rapid development of CNN and The like, a large number of convolutional neural network-based Image Retrieval algorithms have been proposed, among which The most classical algorithm belongs to The CNN and hash algorithm-based Image Retrieval method Deep Supervised Hashing for Fast Image Retrieval (Haomao Liu, ruiping Wang, shigueng, xilin Chen; the IEEE Conference on Computer Vision and Pattern Registration (CVPR), 2016, pp.2064-2072), which effectively extracts feature vectors of images, and reduces The dimensions of The feature vectors using binary codes, with good speed and precision. Therefore, many improved algorithms have appeared on the basis of CNN + hash coding, but this method also has the disadvantage that the "semantic gap" problem in image retrieval has not been solved completely, i.e. it cannot realize the retrieval of similar pictures from the viewpoint of picture semantics.
A method for searching images is provided in a patent 'a CNN-based rapid image searching method' applied to Chinese science and technology university (patent application number: CN201610211503.2, publication number: CN 105912611A). The method comprises a first stage of extracting vector features by using a Google pre-trained CNN network, and a second stage of performing K nearest neighbor searching on the vector features in a feature database. The patent is based on the idea of quick search of PQ, adds a reverse strategy in text search, considers the data volume of the patent in application, reasonably arranges system parameters and improves the reordering of search results. However, the scheme adopts a CNN feature extraction mode, so that the feature vector dimension is high, and the retrieval efficiency is low.
The invention provides a semantic analysis-based network image retrieval method applied to a patent of China academy of sciences automation research (patent application No. CN200910089536.4 and publication No. CN 101751447A). Content-based image retrieval for each feature finds a set of visually similar network images. And performing semantic learning by using the related text information corresponding to each image in the network image set to obtain semantic representation of the query image. And judging semantic consistency of the retrieval image set corresponding to the various features on the text information, measuring description capacity of the various features by the semantic consistency, and endowing different confidence coefficients. Performing text-based image retrieval in an image library by using the semantics and semantic consistency of the query image to obtain the semantic relevance between each image in the image library and the query image; retrieving the content-based images of the image library by using the bottom layer characteristics to obtain the visual relevance of each image in the image library and the query image; the semantic and visual relevance are fused by a linear function, and the images returned to the user have similarity on the semantic level and the visual level. The method has the defects that a retrieval system is too complex and has too many types of characteristics, which greatly influences the retrieval speed and cannot effectively overcome or reduce the semantic gap in the retrieval process.
Disclosure of Invention
The invention aims to provide a semantic image retrieval method based on an attention mechanism aiming at the defects of the prior art. And extracting the text features of the image content of the retrieved picture by using a CNN-RNN depth model with an attention mechanism, extracting a semantic feature vector corresponding to the text features by using a text vector doc2vec model, and comparing the feature vector with the feature vector in an image feature library to obtain similar pictures in the library. The accuracy of image retrieval is effectively improved, and the influence caused by semantic gap is reduced.
The method comprises the following specific steps:
(1) Constructing a CNN-RNN network model containing an attention mechanism and training:
(1a) Preprocessing a picture and a corresponding image title in the MS COCO data set;
(1b) Constructing a convolutional neural network VGG encoder and a cyclic neural network LSTM decoder, and adding an attention mechanism into the decoder to obtain a CNN-RNN network model consisting of the encoder and the decoder;
(1c) Dividing the preprocessed data into a training data set and a testing data set, training the network model by adopting the training data set, and testing by using the testing data set to obtain a final CNN-RNN network model;
(2) Extracting image titles of all pictures in an image library to be retrieved by using the final CNN-RNN network model, namely extracting text characteristics corresponding to the pictures, and storing the extracted text characteristics in a database;
(3) Extracting semantic feature vectors of text features in the database by using a text vector doc2vec model and storing the semantic feature vectors:
(3a) Sequentially processing all the text features obtained in the step (2) by using a text vector doc2vec model in a genesis library to obtain semantic feature vectors corresponding to all the pictures;
(3b) Storing the obtained semantic feature vectors and the corresponding pictures in a database, and matching the semantic feature vectors and the corresponding pictures;
(4) Extracting text features of the query picture by using the final CNN-RNN network model, and extracting corresponding semantic feature vectors;
(5) Comparing the semantic feature vector of the query picture with semantic feature vectors of other pictures in the image library by using a cosine similarity comparison method to obtain similar semantic feature vectors;
(6) And outputting the pictures corresponding to the similar semantic feature vectors, namely querying the similar pictures of the pictures.
Compared with the prior art, the invention has the following advantages:
firstly, because the invention combines computer vision and related technology in natural language processing, namely an attention mechanism (attention mechanism) is introduced into a CNN-RNN network, the network can effectively extract high-level concepts related to pictures and express the concepts expressed by the pictures in a natural language form; the scheme of the invention combines the image retrieval technology based on the content under the image retrieval idea based on the text, so that the advantages of the image retrieval technology and the image retrieval technology based on the content are embodied, and the defects of complexity of manually labeling the text and influence caused by semantic gap are effectively overcome.
Secondly, because the invention adopts the word vector technology which is developed rapidly in the near term and uses the text vector doc2vec on the basis of the word vector, the problem of keeping the word sequence can be effectively solved, and when the natural language description is converted into the vector space, the invention has better conversion effect compared with the word vector word2vec model adopted in the prior art.
Drawings
FIG. 1 is a flow chart of an implementation of the method of the present invention;
FIG. 2 is a schematic diagram of a CNN-RNN network architecture with attention mechanism in the present invention;
FIG. 3 is a schematic diagram of the core structure of the convolutional neural network VGG encoder in the present invention.
Detailed Description
The invention is explained in further detail below with reference to the figures and examples:
referring to fig. 1, the method of the invention includes the following steps:
step 1, constructing a CNN-RNN network model containing an attention mechanism and training:
(1a) Preprocessing the images and the corresponding image titles in the MS COCO data set, wherein the preprocessing comprises word segmentation, syntactic analysis, word vectors and the like;
(1b) Constructing a convolutional neural network VGG encoder and a cyclic neural network LSTM decoder, and adding an attention mechanism into the decoder to obtain a CNN-RNN network model consisting of the encoder and the decoder;
the core structure of the convolutional neural network VGG encoder, i.e., the initiation module, as shown in fig. 3, forms an initiation v2 network by stacking the modules; specifically, the construction of the convolutional neural network VGG encoder is to output the output of the last convolutional layer of the network as the characteristics of the picture, that is, at least 5 characteristic graphs of the last convolutional layer are selected as characteristic vectors to be output. The convolutional neural network is composed of 5 convolutional layers, 3 full-link layers and a softmax output layer, the layers are separated by using maximum pooling, and all hidden layer neurons adopt ReLU activation functions.
The input of the LSTM decoder comprises a word vector of the current step, an output vector of the previous time step and a weighting vector formed by the attention mechanism, and the output is the word vector output by the current time step. The attention mechanism is added into the decoder, namely that when the decoder decodes each time step, the feature vector output by the LSTM decoder of the recurrent neural network is weighted and averaged to obtain a context vector, and the context vector is also used as one input of the decoder network and used for guiding the decoding operation of the current time step. The CNN-RNN network model obtained by combining the LSTM decoder of the recurrent neural network can better solve the problems of gradient disappearance and explosion.
(1c) And dividing the preprocessed data into a training data set and a testing data set, training the network model by adopting the training data set, and testing by using the testing data set to obtain the final CNN-RNN network model.
And 2, extracting image titles of all pictures in the image library to be retrieved by using the final CNN-RNN network model, namely processing the pictures in the image library to be retrieved by using a pre-trained coding and decoding network, sequentially extracting text features (natural language description) corresponding to the pictures, and storing the extracted text features in a database.
Step 3, extracting semantic feature vectors of text features in the database by using a text vector doc2vec model and storing the semantic feature vectors:
(3a) Sequentially processing all the text features obtained in the step (2) by using a text vector doc2vec model in a genesis library, namely converting the extracted natural language into a feature vector space to obtain a semantic feature vector corresponding to each picture; specifically, each sentence described by the natural language is processed by using a doc2vec model to obtain a semantic feature vector corresponding to an image title caption of each picture, namely the semantic feature vector corresponding to the picture;
(3b) And storing the obtained semantic feature vectors and the corresponding pictures in a database, and matching the semantic feature vectors and the corresponding pictures with each other.
Step 4, extracting text features of the query picture by using the final CNN-RNN network model, and extracting corresponding semantic feature vectors; combining a CNN-RNN network with an attention mechanism and a doc2vec model to extract image titles and convert feature vectors of the query pictures; when the query image is to be retrieved, the query image is processed by sequentially using the coding and decoding network and the doc2vec model according to the previous processing mode of other images in the image library, and the feature vector corresponding to the query image is obtained.
Step 5, comparing the semantic feature vector of the query picture with semantic feature vectors of other pictures in the image library by using a cosine similarity comparison method to obtain similar semantic feature vectors;
the cosine similarity comparison method is also called cosine similarity calculation, specifically, similarity between two semantic feature vectors is evaluated by calculating cosine values of included angles of the two semantic feature vectors, and the calculation formula is as follows:
Figure BDA0002553596820000051
wherein A and B respectively represent two different semantic feature vectors. In this embodiment, a is a semantic feature vector of the query picture, and B is a semantic feature vector of another picture in the image library.
Similarity calculation and sequencing are carried out on the feature vectors of the query pictures and the feature vectors in the image library, so that similar semantic feature vectors of the query pictures can be obtained, and pictures corresponding to the similar semantic feature vectors are further obtained, so that which pictures with high similarity to the query pictures in the image library are specific can be obtained.
Step 6, outputting the pictures corresponding to the similar semantic feature vectors, namely the similar pictures of the query pictures;
and outputting the pictures corresponding to the sequenced similar semantic feature vectors according to the result of the last step and the requirement of the user to finish the retrieval.
The effects of the invention can be further illustrated by simulation:
1. simulation experiment conditions are as follows:
the data set used by the present invention is: NUS-WIDE; the data set is a database containing real-world pictures, which can be used for a variety of image processing tasks; 269648 pictures and related 5018 labels on Flickr, six extracted low-level features (64-dimensional color histogram, 144-dimensional color correlation diagram, 73-dimensional edge direction histogram, 128-dimensional wavelet texture, 225-dimensional block-by-block color moment and 500-dimensional bag-of-words feature based on SIFT description), and 247849 user information of images.
The hardware platform is as follows: intel Core i5-4210U CPU;
the software platform is as follows: visual studio code.
2. Contents and results of the experiments
According to the invention, through carrying out experiments on the NUS-WIDE data set, through extracting the natural language description of the picture, the feature vectors containing the semantic information of the picture are further extracted to form a feature image library, then the query picture is processed according to the same method, and finally, the result is obtained through calculation among the vectors. In 3000 pieces of data tested, the simulation results of the algorithms of Learning to Hash with Binary reconstructed images (BRE), deep Learning of Binary Hash codes for Fast Image Retrieval (DLBHC), and Deep super Hashing for Fast Image Retrieval (DSH) are compared, and as shown in table 1, it can be seen that the method has higher efficiency in Image Retrieval.
TABLE 1 mAP index comparison of the present invention and prior methods
Figure BDA0002553596820000061
The simulation analysis proves the correctness and the effectiveness of the method provided by the invention.
The invention has not been described in detail in part of the common general knowledge of those skilled in the art.
The above description is only one specific embodiment of the present invention and should not be construed as limiting the invention in any way, and it will be apparent to those skilled in the art that various modifications and variations in form and detail can be made without departing from the principle of the invention after understanding the content and principle of the invention, but such modifications and variations are still within the scope of the appended claims.

Claims (9)

1. A semantic image retrieval method based on an attention mechanism is characterized by comprising the following steps:
(1) Constructing a CNN-RNN network model containing an attention mechanism and training:
(1a) Preprocessing the images and the corresponding image titles in the MS COCO data set;
(1b) Constructing a convolutional neural network VGG encoder and a cyclic neural network LSTM decoder, and adding an attention mechanism into the decoder to obtain a CNN-RNN network model consisting of the encoder and the decoder;
(1c) Dividing the preprocessed data into a training data set and a testing data set, training the network model by adopting the training data set, and testing by using the testing data set to obtain a final CNN-RNN network model;
(2) Extracting image titles of all pictures in an image library to be retrieved by using the final CNN-RNN network model, namely extracting text characteristics corresponding to the pictures, and storing the extracted text characteristics in a database;
(3) Extracting semantic feature vectors of text features in the database by using a text vector doc2vec model and storing the semantic feature vectors:
(3a) Sequentially processing all the text features obtained in the step (2) by using a text vector doc2vec model in a genesis library to obtain semantic feature vectors corresponding to all the pictures;
(3b) Storing the obtained semantic feature vectors and the corresponding pictures in a database, and matching the semantic feature vectors and the corresponding pictures;
(4) Extracting text features of the query picture by using the final CNN-RNN network model, and extracting corresponding semantic feature vectors;
(5) Comparing the semantic feature vector of the query picture with semantic feature vectors of other pictures in the image library by using a cosine similarity comparison method to obtain similar semantic feature vectors;
(6) And outputting the pictures corresponding to the similar semantic feature vectors, namely the similar pictures of the pictures to be inquired.
2. The method of claim 1, wherein: the text feature is a short text describing the picture content in natural language.
3. The method of claim 1, wherein: the preprocessing in the step (1 a) comprises word segmentation, syntactic analysis and word vectors.
4. The method of claim 1, wherein: the step (1 b) of constructing the convolutional neural network VGG encoder is to specifically output the output of the last convolutional layer of the network as the feature of a picture, that is, at least 5 feature maps of the last convolutional layer are selected as feature vectors to be output.
5. The method of claim 4, wherein: the network structure of the convolutional neural network VGG encoder is composed of 5 convolutional layers, 3 full-connection layers and a softmax output layer, the layers are separated by using maximum pooling, and all hidden layer neurons adopt ReLU activation functions.
6. The method of claim 1, wherein: the step (1 b) of adding the attention mechanism to the decoder means that, at each time step decoded by the decoder, the feature vector output by the LSTM decoder of the recurrent neural network is weighted and averaged to obtain a context vector, and the context vector is also used as an input of the decoder network to implement the operation of guiding the decoding at the current time step.
7. The method of claim 1, wherein: and (2) the input of the LSTM decoder of the recurrent neural network in the step (1 b) comprises a word vector of the current step, an output vector of the previous time step and a weighting vector formed by an attention mechanism, and the output is the word vector output at the current time step.
8. The method of claim 1, wherein: and (3) extracting the semantic feature vector of the text feature in the database, namely converting the natural language description of the picture content into the semantic feature vector.
9. The method of claim 1, wherein: the cosine similarity in the step (5) is calculated according to the following formula:
Figure FDA0002553596810000021
wherein A represents the semantic feature vector of the query picture, and B represents the semantic feature vectors of other pictures in the image library.
CN202010582273.7A 2020-06-23 2020-06-23 Semantic image retrieval method based on attention mechanism Active CN111782853B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010582273.7A CN111782853B (en) 2020-06-23 2020-06-23 Semantic image retrieval method based on attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010582273.7A CN111782853B (en) 2020-06-23 2020-06-23 Semantic image retrieval method based on attention mechanism

Publications (2)

Publication Number Publication Date
CN111782853A CN111782853A (en) 2020-10-16
CN111782853B true CN111782853B (en) 2022-12-02

Family

ID=72757038

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010582273.7A Active CN111782853B (en) 2020-06-23 2020-06-23 Semantic image retrieval method based on attention mechanism

Country Status (1)

Country Link
CN (1) CN111782853B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112256727B (en) * 2020-10-19 2021-10-15 东北大学 Database query processing and optimizing method based on artificial intelligence technology
CN112417190B (en) * 2020-11-27 2024-06-11 暨南大学 Retrieval method and application of ciphertext JPEG image
CN113868447A (en) * 2021-09-27 2021-12-31 新智认知数据服务有限公司 Picture retrieval method, electronic device and computer-readable storage medium
CN113705576B (en) * 2021-11-01 2022-03-25 江西中业智能科技有限公司 Text recognition method and device, readable storage medium and equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018010365A1 (en) * 2016-07-11 2018-01-18 北京大学深圳研究生院 Cross-media search method
CN109766468A (en) * 2019-01-04 2019-05-17 广东技术师范学院 A kind of implementation method and device retrieved in appearance patent image based on iamge description algorithm with management
WO2019235458A1 (en) * 2018-06-04 2019-12-12 国立大学法人大阪大学 Recalled image estimation device, recalled image estimation method, control program, and recording medium
CN111222049A (en) * 2020-01-08 2020-06-02 东北大学 Top-k similarity searching method on semantically enhanced heterogeneous information network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018010365A1 (en) * 2016-07-11 2018-01-18 北京大学深圳研究生院 Cross-media search method
WO2019235458A1 (en) * 2018-06-04 2019-12-12 国立大学法人大阪大学 Recalled image estimation device, recalled image estimation method, control program, and recording medium
CN109766468A (en) * 2019-01-04 2019-05-17 广东技术师范学院 A kind of implementation method and device retrieved in appearance patent image based on iamge description algorithm with management
CN111222049A (en) * 2020-01-08 2020-06-02 东北大学 Top-k similarity searching method on semantically enhanced heterogeneous information network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Deep Learning Based Classification Using Academic Studies in Doc2Vec Model";Yaşar Safali等;《 2019 International Artificial Intelligence and Data Processing Symposium (IDAP)》;20191021;第1-5页 *
"基于视觉注意力机制的图像检索研究";梁晔等;《北京联合大学学报(自然科学版)》;20100331;第30-35页 *

Also Published As

Publication number Publication date
CN111782853A (en) 2020-10-16

Similar Documents

Publication Publication Date Title
CN111782853B (en) Semantic image retrieval method based on attention mechanism
CN110866140B (en) Image feature extraction model training method, image searching method and computer equipment
CN111198959B (en) Two-stage image retrieval method based on convolutional neural network
CN106033426B (en) Image retrieval method based on latent semantic minimum hash
CN110083729B (en) Image searching method and system
Qian et al. Image location inference by multisaliency enhancement
CN111159485B (en) Tail entity linking method, device, server and storage medium
CN111782852B (en) Deep learning-based high-level semantic image retrieval method
CN111339343A (en) Image retrieval method, device, storage medium and equipment
CN114461839B (en) Multi-mode pre-training-based similar picture retrieval method and device and electronic equipment
CN115203421A (en) Method, device and equipment for generating label of long text and storage medium
Song et al. A weighted topic model learned from local semantic space for automatic image annotation
Zhao et al. An angle structure descriptor for image retrieval
CN115187910A (en) Video classification model training method and device, electronic equipment and storage medium
Li et al. Structure-adaptive neighborhood preserving hashing for scalable video search
CN114168773A (en) Semi-supervised sketch image retrieval method based on pseudo label and reordering
CN110110120B (en) Image retrieval method and device based on deep learning
Song et al. Hierarchical deep hashing for image retrieval
Xue et al. Mobile image retrieval using multi-photos as query
CN111783734B (en) Original edition video recognition method and device
CN112883216A (en) Semi-supervised image retrieval method and device based on disturbance consistency self-integration
Zhang et al. Improved image retrieval algorithm of GoogLeNet neural network
Ghosh et al. Efficient indexing for query by string text retrieval
CN112597329B (en) Real-time image retrieval method based on improved semantic segmentation network
CN113190706A (en) Twin network image retrieval method based on second-order attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant