CN110609914B - Online Hash learning image retrieval method based on rapid category updating - Google Patents
Online Hash learning image retrieval method based on rapid category updating Download PDFInfo
- Publication number
- CN110609914B CN110609914B CN201910722255.1A CN201910722255A CN110609914B CN 110609914 B CN110609914 B CN 110609914B CN 201910722255 A CN201910722255 A CN 201910722255A CN 110609914 B CN110609914 B CN 110609914B
- Authority
- CN
- China
- Prior art keywords
- hash
- learning
- data
- similarity
- image retrieval
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/53—Querying
- G06F16/532—Query formulation, e.g. graphical querying
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
An online Hash learning image retrieval method based on rapid category updating relates to image retrieval. Aiming at the defects of low training efficiency and large storage space consumption of the traditional Hash technology requiring one-time acquisition of a training set, the scheme of online learning is considered to replace offline parameter learning, a Hamming space learning method based on category similarity retention is constructed, and a Hash image retrieval learning scheme based on image data flow is provided. The method comprises the following steps: 1) training a Hash model by using an integral training set is not considered, and only a small data stream is used in each model iteration; 2) constructing a similarity retention loss function based on the inner product; 3) using category-based iterative updates; 4) an optimization scheme using a semi-quantization approach.
Description
Technical Field
The invention relates to image retrieval, in particular to an online Hash learning image retrieval method based on rapid category updating.
Background
With the rapid development of internet, cloud computing, internet of things, social media and other information technologies in recent years, data accumulated in various industries shows an explosive growth trend, and the existing capacity and growth speed of the data far exceed the processing capacity of the current technology. Nearest Neighbor Search (Nearest Neighbor Search), also known as Nearest point Search, refers to an optimization problem that searches a scale space for the Nearest point to a query point. The specific definition is as follows: given a database point set S and a query point q ∈ M in the scale space M, the point closest to q is found in S. Where M is a multidimensional euclidean space and the distance is determined by the euclidean distance. Nearest neighbor search is widely used in many fields, such as: computer vision, information retrieval, data mining, machine learning, large-scale learning, and the like. Among them, the application is the most widely in the field of computer vision, such as: computer graphics, image retrieval, replica retrieval, object identification, scene classification, pose estimation, feature matching, and the like.
The nearest neighbor search has two key problems: 1. the feature dimension is high; 2. the amount of data is large. Therefore, the simple exhaustive search faces a very high time complexity, and loading the original data from the storage to the memory also becomes a bottleneck that must be solved in the practical application. In recent years, some fast and effective nearest neighbor search methods with time complexity of sub-linearity have appeared in practical applications, for example: KD-trees, Ball-trees, Metric-trees, variable-point-trees, etc. However, the tree-based index methods themselves have a problem that they require too much storage space, and sometimes the space for storing the index trees even exceeds the storage space required for storing the data themselves. Meanwhile, as the dimensionality of the data increases, the data retrieval time is obviously affected, and the retrieval time is increased sharply. Unlike tree-based indexing, which recursively partitions the data space, a hash-like algorithm (also referred to as binary encoding) repetitively bi-classifies the entire data space while bi-coding once for each partition. That is, the hash algorithm maps the input data to a discrete hamming space, with each data point represented by a string of binary codes. In most cases, the hash algorithm does not use the binary code obtained after encoding for exhaustive search, but organizes the binary code into a hash table, where each hash code corresponds to one of the entries in the table, as shown in fig. 1. The Hamming distance can be quickly calculated through XOR operation, so that the database is exhaustively searched by using the hash code, and the time complexity can meet the application requirement.
Currently, typical hash methods are: semi-supervised hashing, unsupervised hashing, and deep learning combined hashing. However, these learning-based hash methods can be classified into supervised learning hash methods and unsupervised learning hash methods from using supervised information. Because the cost for acquiring the supervision information is high, and only a small part of data has the supervision information under the condition of super-large-scale data, the mainstream research hotspots at present are an unsupervised learning hash method and a semi-supervised learning method. Although the hash combined with the deep learning is excellent in retrieval performance, supervision information is required at the time of training. The unsupervised learning hash method is mainly researched, better hash representation is obtained by utilizing the local linear relation between data, the semantic distance between the data and the original data can be well represented by assuming that a uniform semantic distance exists in the space of the original data like distance scale learning, and the data semantic similarity can be better reflected by the relation between local data, namely nearest neighbor of the data. Induced flow Hashing (induced Hashing On mangels) (references Shen, Fumin, Chunhua Shen, Qinfeng Shi, Anton Van Den Hengel, and Zhenmin Tang. "induced Hashing On mangels," In Proceedings of the IEEE Conference On Computer Vision and Pattern Recognition,2013), local Linear Hashing (local Linear Hashing) (references Irie, Go, Zhenguo Li, Xiao-Ming Wu, and Shih-Fu Change. "local Linear Hashing for extracting non-Linear Hashing." In Proceedings of the IEEE Conference On Computer and local Hashing 2014 for learning data of the local hash relationship, however, the two methods obtain the hash representation of the training data during learning, the hash function cannot be directly obtained, the generalization of the hash representation cannot be directly performed on the non-training data, both methods obtain a hashed representation of the query data by approximating it with a hashed representation of the training data.
Disclosure of Invention
The invention aims to solve the defects of low training efficiency, high storage space consumption and the like of the traditional Hash technology requiring one-time acquisition of a training set, and provides a Hash image retrieval learning scheme based on an image data stream by considering an online learning scheme to replace offline parameter learning and constructing a Hamming space learning method based on category similarity retention.
The invention comprises the following steps:
1) training a Hash model by using an integral training set is not considered, and only a small data stream is used in each model iteration;
2) constructing a similarity retention loss function based on the inner product;
3) category-based iterative updating;
4) an optimization scheme using a semi-quantization approach.
The invention has the following outstanding advantages:
1) on one hand, the invention absorbs the advantages of the traditional offline hash method: the method has the advantages of low storage space and high-efficiency Hamming distance calculation, and overcomes the defects of low training efficiency and high memory consumption in the traditional Hash image retrieval. In each stage of the Hash model training, only the similar information of the data pairs is considered, and the retention of the similar information of the data pairs in a Hamming space is considered, so that the on-line Hash learning method based on the rapid class updating is constructed, the algorithm can retain the similar information of the query data and the data in the database to a greater extent, the precision loss is reduced, and the similar data can be arranged in the front of a retrieval list. In the invention, the required training equipment has lower requirements, the memory requirements only need to meet the size of a pair of data as much as possible, and the training mode based on the data pair greatly compresses the consumption of training time.
2) The invention designs a category-based updating scheme based on the similarity retention scheme of the inner product, saves at least 75% of space storage overhead, and achieves better performance index by using less training data. Compared with the traditional online Hash learning scheme, the method respectively obtains 1.27%, 3.01% and 1.20% mAP index increase and 7.80%, 5.35% and 1.01% Hamming ball within 2 distance accuracy increase in three classical data sets of CIFAR-10, Places205 and MNIST.
3) According to the semi-quantization optimization scheme provided by the invention, the training efficiency of retaining the Hash retrieval based on the inner product similarity is greatly improved by using the binarization constraint of a quantization part as the continuity constraint and simultaneously retaining the binarization constraint of the other part as the constant.
4) The training scheme based on the data flow enables the model to require extremely small calculated amount and storage amount, better accords with the characteristic that training data cannot be obtained all at one time in real life, and has a great deal of application prospects in the fields of image retrieval, copy retrieval, feature matching and the like.
Drawings
FIG. 1 is a diagram of an online learning framework of the present invention.
Detailed Description
The following examples will further illustrate the present invention with reference to the accompanying drawings.
The invention aims to solve the defects of low training efficiency and high storage space consumption of the traditional Hash technology requiring one-time acquisition of a training set, and designs a Hash learning scheme only requiring a small piece of data flow similar information by considering that an online learning scheme replaces offline parameter learning and constructing a Hamming space learning method based on class similarity retention, wherein a specific algorithm flow is shown in figure 1.
In an iteration of t rounds, for data XtFirst assume that there is an existing data pair similarity distribution matrixAnd an unknown Hamming space distance distribution matrixThe object of the invention is to make the distance distribution Q of the Hamming spacetWith retention of the similarity distribution QtTo achieve this, a KL divergence scheme is first employed that minimizes two probability distributions:
wherein, BtFor the hash code to be learned, StFor the similarity matrix to be constructed, r is the bit size, ntIs the data stream size.
In order to achieve the above object, the specific process is as follows:
1) and constructing a similarity matrix.
First consider building X in t roundstSimilarity matrix St
2) A category-based update.
Rewriting (1) into the form
Note the book
3) And (5) semi-relaxation quantization.
Further rewriting (4) into
4) And optimizing learning.
The performance of mAP and Precision @ H indexes in CIFAR-10 of the invention is shown in Table 1.
TABLE 1
The performance of the mAP and Precision @ H indexes in the plants 205 of the invention is shown in the table 2.
TABLE 2
The performance of the mAP and Precision @ H indexes in MNIST is shown in Table 3.
TABLE 3
The invention designs a category-based updating scheme based on the similarity retention scheme of the inner product, saves at least 75% of space storage overhead, and achieves better performance index by using less training data. Compared with the traditional online Hash learning scheme, the method respectively obtains 1.27%, 3.01% and 1.20% mAP index increase and 7.80%, 5.35% and 1.01% Hamming ball within 2 distance accuracy increase in three classical data sets of CIFAR-10, Places205 and MNIST. The training scheme based on the data flow enables the model to require extremely small calculated amount and storage amount, better accords with the characteristic that training data cannot be obtained all at one time in real life, and has a great deal of application prospects in the fields of image retrieval, copy retrieval, feature matching and the like.
Claims (1)
1. A supervised online Hash image retrieval method based on similarity distribution learning is characterized by comprising the following steps:
1) regardless of training the hash model with the entire training set, only a small data stream is used in each model iteration, as follows:
in an iteration of t rounds, for data XtFirst assume that there is an existing data pair similarity distribution matrixAnd an unknown Hamming space distance distribution matrixA KL divergence scheme is employed that minimizes two probability distributions:
wherein, BtFor the hash code to be learned, StFor the similarity matrix to be constructed, r is the bit size, ntIs the data stream size;
2) constructing a similarity retention loss function based on the inner product: consider building X in t roundstSimilarity matrix St
3) Class-based iteration: rewrite equation (1) to the following form
Note the book
4) Optimization scheme using semi-quantization: the semi-relaxation quantization, further modified equation (4) as:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910722255.1A CN110609914B (en) | 2019-08-06 | 2019-08-06 | Online Hash learning image retrieval method based on rapid category updating |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910722255.1A CN110609914B (en) | 2019-08-06 | 2019-08-06 | Online Hash learning image retrieval method based on rapid category updating |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110609914A CN110609914A (en) | 2019-12-24 |
CN110609914B true CN110609914B (en) | 2021-08-17 |
Family
ID=68890762
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910722255.1A Active CN110609914B (en) | 2019-08-06 | 2019-08-06 | Online Hash learning image retrieval method based on rapid category updating |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110609914B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102819582A (en) * | 2012-07-26 | 2012-12-12 | 华数传媒网络有限公司 | Quick searching method for mass images |
CN110059198A (en) * | 2019-04-08 | 2019-07-26 | 浙江大学 | A kind of discrete Hash search method across modal data kept based on similitude |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9507870B2 (en) * | 2009-05-05 | 2016-11-29 | Suboti, Llc | System, method and computer readable medium for binding authored content to the events used to generate the content |
US9135725B2 (en) * | 2012-06-29 | 2015-09-15 | Apple Inc. | Generic media covers |
KR101912237B1 (en) * | 2016-11-25 | 2018-10-26 | 주식회사 인디씨에프 | Method for Attaching Hash-Tag Using Image Recognition Process and Software Distributing Server Storing Software for the same Method |
-
2019
- 2019-08-06 CN CN201910722255.1A patent/CN110609914B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102819582A (en) * | 2012-07-26 | 2012-12-12 | 华数传媒网络有限公司 | Quick searching method for mass images |
CN110059198A (en) * | 2019-04-08 | 2019-07-26 | 浙江大学 | A kind of discrete Hash search method across modal data kept based on similitude |
Non-Patent Citations (1)
Title |
---|
"Supervised Online Hashing via Similarity Distribution Learning";Mingbao Lin 等;《arXiv:1905.13382v1》;20190531;第4321-4330页 * |
Also Published As
Publication number | Publication date |
---|---|
CN110609914A (en) | 2019-12-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109166615B (en) | Medical CT image storage and retrieval method based on random forest hash | |
EP3752930B1 (en) | Random draw forest index structure for searching large scale unstructured data | |
CN110222218B (en) | Image retrieval method based on multi-scale NetVLAD and depth hash | |
CN104035949A (en) | Similarity data retrieval method based on locality sensitive hashing (LASH) improved algorithm | |
CN109710792B (en) | Index-based rapid face retrieval system application | |
US11106708B2 (en) | Layered locality sensitive hashing (LSH) partition indexing for big data applications | |
CN107291895B (en) | Quick hierarchical document query method | |
CN106874425B (en) | Storm-based real-time keyword approximate search algorithm | |
Xu et al. | Online product quantization | |
CN111079949A (en) | Hash learning method, unsupervised online Hash learning method and application thereof | |
CN107180079B (en) | Image retrieval method based on convolutional neural network and tree and hash combined index | |
Li et al. | I/O efficient approximate nearest neighbour search based on learned functions | |
CN112256727B (en) | Database query processing and optimizing method based on artificial intelligence technology | |
CN115048539B (en) | Social media data online retrieval method and system based on dynamic memory | |
CN108182256A (en) | It is a kind of based on the discrete efficient image search method for being locally linear embedding into Hash | |
Eghbali et al. | Online nearest neighbor search using hamming weight trees | |
CN109446293B (en) | Parallel high-dimensional neighbor query method | |
Wan et al. | Cd-tree: A clustering-based dynamic indexing and retrieval approach | |
CN110609914B (en) | Online Hash learning image retrieval method based on rapid category updating | |
CN115618039A (en) | Image retrieval method based on multi-label projection online Hash algorithm | |
Chiu et al. | Approximate asymmetric search for binary embedding codes | |
CN110704575B (en) | Dynamic self-adaptive binary hierarchical vocabulary tree image retrieval method | |
CN114911826A (en) | Associated data retrieval method and system | |
Tian et al. | Approximate Nearest Neighbor Search in High Dimensional Vector Databases: Current Research and Future Directions. | |
Lu et al. | Dynamic Partition Forest: An Efficient and Distributed Indexing Scheme for Similarity Search based on Hashing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20230105 Address after: Room A-0489, Floor 2, Building 3, Yard 30, Shixing Street, Shijingshan District, Beijing, 100043 Patentee after: Investment&Finance (Beijing) Information Technology Co.,Ltd. Address before: Xiamen City, Fujian Province, 361005 South Siming Road No. 422 Patentee before: XIAMEN University |