CN110609914B - Online Hash learning image retrieval method based on rapid category updating - Google Patents

Online Hash learning image retrieval method based on rapid category updating Download PDF

Info

Publication number
CN110609914B
CN110609914B CN201910722255.1A CN201910722255A CN110609914B CN 110609914 B CN110609914 B CN 110609914B CN 201910722255 A CN201910722255 A CN 201910722255A CN 110609914 B CN110609914 B CN 110609914B
Authority
CN
China
Prior art keywords
hash
learning
data
similarity
image retrieval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910722255.1A
Other languages
Chinese (zh)
Other versions
CN110609914A (en
Inventor
纪荣嵘
林明宝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Investment&Finance (Beijing) Information Technology Co.,Ltd.
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN201910722255.1A priority Critical patent/CN110609914B/en
Publication of CN110609914A publication Critical patent/CN110609914A/en
Application granted granted Critical
Publication of CN110609914B publication Critical patent/CN110609914B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/532Query formulation, e.g. graphical querying

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An online Hash learning image retrieval method based on rapid category updating relates to image retrieval. Aiming at the defects of low training efficiency and large storage space consumption of the traditional Hash technology requiring one-time acquisition of a training set, the scheme of online learning is considered to replace offline parameter learning, a Hamming space learning method based on category similarity retention is constructed, and a Hash image retrieval learning scheme based on image data flow is provided. The method comprises the following steps: 1) training a Hash model by using an integral training set is not considered, and only a small data stream is used in each model iteration; 2) constructing a similarity retention loss function based on the inner product; 3) using category-based iterative updates; 4) an optimization scheme using a semi-quantization approach.

Description

Online Hash learning image retrieval method based on rapid category updating
Technical Field
The invention relates to image retrieval, in particular to an online Hash learning image retrieval method based on rapid category updating.
Background
With the rapid development of internet, cloud computing, internet of things, social media and other information technologies in recent years, data accumulated in various industries shows an explosive growth trend, and the existing capacity and growth speed of the data far exceed the processing capacity of the current technology. Nearest Neighbor Search (Nearest Neighbor Search), also known as Nearest point Search, refers to an optimization problem that searches a scale space for the Nearest point to a query point. The specific definition is as follows: given a database point set S and a query point q ∈ M in the scale space M, the point closest to q is found in S. Where M is a multidimensional euclidean space and the distance is determined by the euclidean distance. Nearest neighbor search is widely used in many fields, such as: computer vision, information retrieval, data mining, machine learning, large-scale learning, and the like. Among them, the application is the most widely in the field of computer vision, such as: computer graphics, image retrieval, replica retrieval, object identification, scene classification, pose estimation, feature matching, and the like.
The nearest neighbor search has two key problems: 1. the feature dimension is high; 2. the amount of data is large. Therefore, the simple exhaustive search faces a very high time complexity, and loading the original data from the storage to the memory also becomes a bottleneck that must be solved in the practical application. In recent years, some fast and effective nearest neighbor search methods with time complexity of sub-linearity have appeared in practical applications, for example: KD-trees, Ball-trees, Metric-trees, variable-point-trees, etc. However, the tree-based index methods themselves have a problem that they require too much storage space, and sometimes the space for storing the index trees even exceeds the storage space required for storing the data themselves. Meanwhile, as the dimensionality of the data increases, the data retrieval time is obviously affected, and the retrieval time is increased sharply. Unlike tree-based indexing, which recursively partitions the data space, a hash-like algorithm (also referred to as binary encoding) repetitively bi-classifies the entire data space while bi-coding once for each partition. That is, the hash algorithm maps the input data to a discrete hamming space, with each data point represented by a string of binary codes. In most cases, the hash algorithm does not use the binary code obtained after encoding for exhaustive search, but organizes the binary code into a hash table, where each hash code corresponds to one of the entries in the table, as shown in fig. 1. The Hamming distance can be quickly calculated through XOR operation, so that the database is exhaustively searched by using the hash code, and the time complexity can meet the application requirement.
Currently, typical hash methods are: semi-supervised hashing, unsupervised hashing, and deep learning combined hashing. However, these learning-based hash methods can be classified into supervised learning hash methods and unsupervised learning hash methods from using supervised information. Because the cost for acquiring the supervision information is high, and only a small part of data has the supervision information under the condition of super-large-scale data, the mainstream research hotspots at present are an unsupervised learning hash method and a semi-supervised learning method. Although the hash combined with the deep learning is excellent in retrieval performance, supervision information is required at the time of training. The unsupervised learning hash method is mainly researched, better hash representation is obtained by utilizing the local linear relation between data, the semantic distance between the data and the original data can be well represented by assuming that a uniform semantic distance exists in the space of the original data like distance scale learning, and the data semantic similarity can be better reflected by the relation between local data, namely nearest neighbor of the data. Induced flow Hashing (induced Hashing On mangels) (references Shen, Fumin, Chunhua Shen, Qinfeng Shi, Anton Van Den Hengel, and Zhenmin Tang. "induced Hashing On mangels," In Proceedings of the IEEE Conference On Computer Vision and Pattern Recognition,2013), local Linear Hashing (local Linear Hashing) (references Irie, Go, Zhenguo Li, Xiao-Ming Wu, and Shih-Fu Change. "local Linear Hashing for extracting non-Linear Hashing." In Proceedings of the IEEE Conference On Computer and local Hashing 2014 for learning data of the local hash relationship, however, the two methods obtain the hash representation of the training data during learning, the hash function cannot be directly obtained, the generalization of the hash representation cannot be directly performed on the non-training data, both methods obtain a hashed representation of the query data by approximating it with a hashed representation of the training data.
Disclosure of Invention
The invention aims to solve the defects of low training efficiency, high storage space consumption and the like of the traditional Hash technology requiring one-time acquisition of a training set, and provides a Hash image retrieval learning scheme based on an image data stream by considering an online learning scheme to replace offline parameter learning and constructing a Hamming space learning method based on category similarity retention.
The invention comprises the following steps:
1) training a Hash model by using an integral training set is not considered, and only a small data stream is used in each model iteration;
2) constructing a similarity retention loss function based on the inner product;
3) category-based iterative updating;
4) an optimization scheme using a semi-quantization approach.
The invention has the following outstanding advantages:
1) on one hand, the invention absorbs the advantages of the traditional offline hash method: the method has the advantages of low storage space and high-efficiency Hamming distance calculation, and overcomes the defects of low training efficiency and high memory consumption in the traditional Hash image retrieval. In each stage of the Hash model training, only the similar information of the data pairs is considered, and the retention of the similar information of the data pairs in a Hamming space is considered, so that the on-line Hash learning method based on the rapid class updating is constructed, the algorithm can retain the similar information of the query data and the data in the database to a greater extent, the precision loss is reduced, and the similar data can be arranged in the front of a retrieval list. In the invention, the required training equipment has lower requirements, the memory requirements only need to meet the size of a pair of data as much as possible, and the training mode based on the data pair greatly compresses the consumption of training time.
2) The invention designs a category-based updating scheme based on the similarity retention scheme of the inner product, saves at least 75% of space storage overhead, and achieves better performance index by using less training data. Compared with the traditional online Hash learning scheme, the method respectively obtains 1.27%, 3.01% and 1.20% mAP index increase and 7.80%, 5.35% and 1.01% Hamming ball within 2 distance accuracy increase in three classical data sets of CIFAR-10, Places205 and MNIST.
3) According to the semi-quantization optimization scheme provided by the invention, the training efficiency of retaining the Hash retrieval based on the inner product similarity is greatly improved by using the binarization constraint of a quantization part as the continuity constraint and simultaneously retaining the binarization constraint of the other part as the constant.
4) The training scheme based on the data flow enables the model to require extremely small calculated amount and storage amount, better accords with the characteristic that training data cannot be obtained all at one time in real life, and has a great deal of application prospects in the fields of image retrieval, copy retrieval, feature matching and the like.
Drawings
FIG. 1 is a diagram of an online learning framework of the present invention.
Detailed Description
The following examples will further illustrate the present invention with reference to the accompanying drawings.
The invention aims to solve the defects of low training efficiency and high storage space consumption of the traditional Hash technology requiring one-time acquisition of a training set, and designs a Hash learning scheme only requiring a small piece of data flow similar information by considering that an online learning scheme replaces offline parameter learning and constructing a Hamming space learning method based on class similarity retention, wherein a specific algorithm flow is shown in figure 1.
In an iteration of t rounds, for data XtFirst assume that there is an existing data pair similarity distribution matrix
Figure BDA0002157632620000031
And an unknown Hamming space distance distribution matrix
Figure BDA0002157632620000032
The object of the invention is to make the distance distribution Q of the Hamming spacetWith retention of the similarity distribution QtTo achieve this, a KL divergence scheme is first employed that minimizes two probability distributions:
Figure BDA0002157632620000033
wherein, BtFor the hash code to be learned, StFor the similarity matrix to be constructed, r is the bit size, ntIs the data stream size.
In order to achieve the above object, the specific process is as follows:
1) and constructing a similarity matrix.
First consider building X in t roundstSimilarity matrix St
Figure BDA0002157632620000041
2) A category-based update.
Rewriting (1) into the form
Figure BDA0002157632620000042
Figure BDA0002157632620000043
Note the book
Figure BDA0002157632620000044
Figure BDA0002157632620000045
Figure BDA0002157632620000046
3) And (5) semi-relaxation quantization.
Further rewriting (4) into
Figure BDA0002157632620000047
Figure BDA0002157632620000048
4) And optimizing learning.
The performance of mAP and Precision @ H indexes in CIFAR-10 of the invention is shown in Table 1.
TABLE 1
Figure BDA0002157632620000049
The performance of the mAP and Precision @ H indexes in the plants 205 of the invention is shown in the table 2.
TABLE 2
Figure BDA0002157632620000051
The performance of the mAP and Precision @ H indexes in MNIST is shown in Table 3.
TABLE 3
Figure BDA0002157632620000052
The invention designs a category-based updating scheme based on the similarity retention scheme of the inner product, saves at least 75% of space storage overhead, and achieves better performance index by using less training data. Compared with the traditional online Hash learning scheme, the method respectively obtains 1.27%, 3.01% and 1.20% mAP index increase and 7.80%, 5.35% and 1.01% Hamming ball within 2 distance accuracy increase in three classical data sets of CIFAR-10, Places205 and MNIST. The training scheme based on the data flow enables the model to require extremely small calculated amount and storage amount, better accords with the characteristic that training data cannot be obtained all at one time in real life, and has a great deal of application prospects in the fields of image retrieval, copy retrieval, feature matching and the like.

Claims (1)

1. A supervised online Hash image retrieval method based on similarity distribution learning is characterized by comprising the following steps:
1) regardless of training the hash model with the entire training set, only a small data stream is used in each model iteration, as follows:
in an iteration of t rounds, for data XtFirst assume that there is an existing data pair similarity distribution matrix
Figure FDA0003065130330000011
And an unknown Hamming space distance distribution matrix
Figure FDA0003065130330000012
A KL divergence scheme is employed that minimizes two probability distributions:
Figure FDA0003065130330000013
wherein, BtFor the hash code to be learned, StFor the similarity matrix to be constructed, r is the bit size, ntIs the data stream size;
2) constructing a similarity retention loss function based on the inner product: consider building X in t roundstSimilarity matrix St
Figure FDA0003065130330000014
3) Class-based iteration: rewrite equation (1) to the following form
Figure FDA0003065130330000015
Figure FDA0003065130330000016
Note the book
Figure FDA0003065130330000017
Figure FDA0003065130330000018
Figure FDA0003065130330000019
4) Optimization scheme using semi-quantization: the semi-relaxation quantization, further modified equation (4) as:
Figure FDA00030651303300000110
Figure FDA00030651303300000111
CN201910722255.1A 2019-08-06 2019-08-06 Online Hash learning image retrieval method based on rapid category updating Active CN110609914B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910722255.1A CN110609914B (en) 2019-08-06 2019-08-06 Online Hash learning image retrieval method based on rapid category updating

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910722255.1A CN110609914B (en) 2019-08-06 2019-08-06 Online Hash learning image retrieval method based on rapid category updating

Publications (2)

Publication Number Publication Date
CN110609914A CN110609914A (en) 2019-12-24
CN110609914B true CN110609914B (en) 2021-08-17

Family

ID=68890762

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910722255.1A Active CN110609914B (en) 2019-08-06 2019-08-06 Online Hash learning image retrieval method based on rapid category updating

Country Status (1)

Country Link
CN (1) CN110609914B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102819582A (en) * 2012-07-26 2012-12-12 华数传媒网络有限公司 Quick searching method for mass images
CN110059198A (en) * 2019-04-08 2019-07-26 浙江大学 A kind of discrete Hash search method across modal data kept based on similitude

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9507870B2 (en) * 2009-05-05 2016-11-29 Suboti, Llc System, method and computer readable medium for binding authored content to the events used to generate the content
US9135725B2 (en) * 2012-06-29 2015-09-15 Apple Inc. Generic media covers
KR101912237B1 (en) * 2016-11-25 2018-10-26 주식회사 인디씨에프 Method for Attaching Hash-Tag Using Image Recognition Process and Software Distributing Server Storing Software for the same Method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102819582A (en) * 2012-07-26 2012-12-12 华数传媒网络有限公司 Quick searching method for mass images
CN110059198A (en) * 2019-04-08 2019-07-26 浙江大学 A kind of discrete Hash search method across modal data kept based on similitude

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Supervised Online Hashing via Similarity Distribution Learning";Mingbao Lin 等;《arXiv:1905.13382v1》;20190531;第4321-4330页 *

Also Published As

Publication number Publication date
CN110609914A (en) 2019-12-24

Similar Documents

Publication Publication Date Title
CN109166615B (en) Medical CT image storage and retrieval method based on random forest hash
EP3752930B1 (en) Random draw forest index structure for searching large scale unstructured data
CN110222218B (en) Image retrieval method based on multi-scale NetVLAD and depth hash
CN104035949A (en) Similarity data retrieval method based on locality sensitive hashing (LASH) improved algorithm
CN109710792B (en) Index-based rapid face retrieval system application
US11106708B2 (en) Layered locality sensitive hashing (LSH) partition indexing for big data applications
CN107291895B (en) Quick hierarchical document query method
CN106874425B (en) Storm-based real-time keyword approximate search algorithm
Xu et al. Online product quantization
CN111079949A (en) Hash learning method, unsupervised online Hash learning method and application thereof
CN107180079B (en) Image retrieval method based on convolutional neural network and tree and hash combined index
Li et al. I/O efficient approximate nearest neighbour search based on learned functions
CN112256727B (en) Database query processing and optimizing method based on artificial intelligence technology
CN115048539B (en) Social media data online retrieval method and system based on dynamic memory
CN108182256A (en) It is a kind of based on the discrete efficient image search method for being locally linear embedding into Hash
Eghbali et al. Online nearest neighbor search using hamming weight trees
CN109446293B (en) Parallel high-dimensional neighbor query method
Wan et al. Cd-tree: A clustering-based dynamic indexing and retrieval approach
CN110609914B (en) Online Hash learning image retrieval method based on rapid category updating
CN115618039A (en) Image retrieval method based on multi-label projection online Hash algorithm
Chiu et al. Approximate asymmetric search for binary embedding codes
CN110704575B (en) Dynamic self-adaptive binary hierarchical vocabulary tree image retrieval method
CN114911826A (en) Associated data retrieval method and system
Tian et al. Approximate Nearest Neighbor Search in High Dimensional Vector Databases: Current Research and Future Directions.
Lu et al. Dynamic Partition Forest: An Efficient and Distributed Indexing Scheme for Similarity Search based on Hashing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230105

Address after: Room A-0489, Floor 2, Building 3, Yard 30, Shixing Street, Shijingshan District, Beijing, 100043

Patentee after: Investment&Finance (Beijing) Information Technology Co.,Ltd.

Address before: Xiamen City, Fujian Province, 361005 South Siming Road No. 422

Patentee before: XIAMEN University