CN109710607A - A kind of hash query method solved based on weight towards higher-dimension big data - Google Patents

A kind of hash query method solved based on weight towards higher-dimension big data Download PDF

Info

Publication number
CN109710607A
CN109710607A CN201811317132.1A CN201811317132A CN109710607A CN 109710607 A CN109710607 A CN 109710607A CN 201811317132 A CN201811317132 A CN 201811317132A CN 109710607 A CN109710607 A CN 109710607A
Authority
CN
China
Prior art keywords
matrix
data
dimension
updated
final
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811317132.1A
Other languages
Chinese (zh)
Other versions
CN109710607B (en
Inventor
孙瑶
钱江波
胡伟
任艳多
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dragon Totem Technology Hefei Co ltd
Shanghai Haonashi Network Technology Co.,Ltd.
Original Assignee
Ningbo University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningbo University filed Critical Ningbo University
Priority to CN201811317132.1A priority Critical patent/CN109710607B/en
Publication of CN109710607A publication Critical patent/CN109710607A/en
Application granted granted Critical
Publication of CN109710607B publication Critical patent/CN109710607B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The hash query method solved based on weight towards higher-dimension big data that the invention discloses a kind of, feature is first with Principal Component Analysis respectively by original High Dimensional Data Set and given inquiry Data Dimensionality Reduction, then it constructs loss function and obtains final binary coded matrix and final weight matrix by minimizing the function in an iterative process, quantization is weighted to each element in final binary coded matrix further according to final weight matrix, binary coded matrix and binary coding corresponding with given inquiry data after being weighted, finally using the nearest corresponding original high dimensional data of row vector data of binary-coded weighting functions corresponding with given inquiry data as final K-NN search result in the binary coded matrix after weighting;Advantage is by replacing Hamming distances using weighting functions, and the accuracy and efficiency inquired given inquiry data greatly improves.

Description

A kind of hash query method solved based on weight towards higher-dimension big data
Technical field
The present invention relates to a kind of hash query method, in particular to it is a kind of towards higher-dimension big data based on weight solve Hash query method.
Background technique
Approximate near neighbor problem is a Computer Subject underlying issue.Under normal circumstances, Hash technology is to be able to solve greatly A kind of effective ways of scale high dimensional data inquiry.In the related technology, the Hash of this data set is encoded, Hash technology is not Consider the weight of each dimension, that is, think that each dimensionality weight is equal, but Hash coding different dimensions are for similitude between data Influence is different, and the relevant technologies do not fully consider Hash code weight, is had much room for improvement.
Summary of the invention
Technical problem to be solved by the invention is to provide it is a kind of can characteristic for data sets coding is weighted Quantization obtains corresponding Hash coding, reduces data space and calculates cost, improves the big towards higher-dimension of inquiry accuracy The hash query method of data solved based on weight.
The technical scheme of the invention to solve the technical problem is: it is a kind of towards higher-dimension big data based on weight The hash query method of solution, comprising the following steps:
1. obtaining the original High Dimensional Data Set X being made of n original high dimensional datas and given inquiry data q, X being n × d The matrix of dimension carries out dimensionality reduction to X using Principal Component Analysis, obtains low-dimensional vector set V corresponding with X,Wherein, V is the matrix of n × c dimension, vijIndicate i-th of data jth dimension in original high dimensional data The corresponding low-dimensional vector element in V reuses Principal Component Analysis and carries out dimensionality reduction to q, obtains low-dimensional vector corresponding with q q';
2. obtaining final binary coded matrix B " and final weight matrix W " by iteration, detailed process is as follows:
2. -1 setting maximum number of iterations, gives initial binary encoder matrix B, B ∈ { -1,1 } at randomn×c, random given Initial weight matrix W, W=diag (w1, w2... wj..., wc), wherein wjIndicate the dimension weight of jth dimension;
2. -2 start iterative process, during current an iteration, holding W first is constant, by right Minimize solving and B is updated, it willThe B updated when minimum is denoted as B ', Wherein, | | | |FFor the F- norm sign for taking matrix,In 2 indicate squared symbols, bijIt indicates The corresponding binary coded value of i-th of data jth dimension in original high dimensional data;
2. -3 are updated the dimension weight of all dimensions during current an iteration, wherein to wjIt is updated Process it is as follows: by B ' jth column column vector be denoted as βj, the jth column column vector in V is denoted as γj, then have B '={ β1, β2,…βj…,βc, V={ γ12,…γj…,γc, keep B ' constant, by right
,Minimize and solves to wjIt is updated, wherein | | | |2To take the 2- norm of matrix to accord with Number, it willThe w updated when minimumjIt is denoted as wj', by all dimensions updated during current an iteration Dimension weight arranged in sequence after obtained weight matrix be denoted as W';
2. -4 judge whether the number of iterations reaches the maximum number of iterations of setting, if not up to maximum number of iterations, enables W=W', B=B ', 2. -2 beginning next iteration process, Simultaneous Iteration number add 1 to return step, wherein W=W' and B=B ' In "=" be assignment;If reaching maximum number of iterations, current an iteration is updated in the process obtained W' as Current an iteration is updated obtained B ' in the process and is used as final binary coded matrix B " by final weight matrix W ";
3. being weighted quantization to each element in B " according to W ", the binary coded matrix Z after being weighted;
4. according to W " and B ", it obtainsQ' when minimum, as binary coding q " corresponding with q', It is searched in Z and the nearest row vector data of weighting functions of q ", row vector number that will be nearest with the weighting functions of q " According to corresponding original high dimensional data as final K-NN search as a result, completing the hash query process to q.
2. maximum number of iterations that the step is set in -1 is 50 time.
Compared with the prior art, the advantages of the present invention are as follows: first with Principal Component Analysis respectively by original high dimension According to collection and given inquiry Data Dimensionality Reduction, loss function is then constructed using low-dimensional vector according to guarantor's principle of similarity in pairs, Final binary coded matrix and final weight matrix are obtained by minimizing the function in iterative process, further according to final weight Matrix is weighted quantization to each element in final binary coded matrix, the binary coded matrix after being weighted; According to final binary coded matrix and final weight matrix, binary coding corresponding with given inquiry data is obtained, most Searched in the binary coded matrix after weighting afterwards with the given corresponding binary-coded weighting hamming of inquiry data away from From nearest column vector data, by the nearest row of binary-coded weighting functions corresponding with given inquiry data to The corresponding original high dimensional data of amount data is as final K-NN search as a result, completing to look into the Hash of given inquiry data Inquiry process is capable of the data information of better mining data concentration, is kept by replacing Hamming distances using weighting functions Affinity information between data, the accuracy and efficiency inquired given inquiry data greatly improve.
Detailed description of the invention
Fig. 1 is step flow diagram of the invention.
Specific embodiment
A kind of hash query method solved based on weight towards higher-dimension big data, comprising the following steps:
1. obtaining the original High Dimensional Data Set X being made of n original high dimensional datas and given inquiry data q, X being n × d The matrix of dimension carries out dimensionality reduction to X using Principal Component Analysis, obtains low-dimensional vector set V corresponding with X,Wherein, V is the matrix of n × c dimension, vijIndicate i-th of data jth dimension in original high dimensional data The corresponding low-dimensional vector element in V reuses Principal Component Analysis and carries out dimensionality reduction to q, obtains low-dimensional vector corresponding with q q'。
2. final binary coded matrix B " and final weight matrix W are obtained by iteration ", detailed process is as follows:
2. -1 setting maximum number of iterations, gives initial binary encoder matrix B, B ∈ { -1,1 } at randomn×c, random given Initial weight matrix W, W=diag (w1, w2... wj..., wc), wherein wjIndicate the dimension weight of jth dimension, wherein set Maximum number of iterations can be 50 times;
2. -2 start iterative process, during current an iteration, holding W first is constant, by right Minimize solving and B is updated, it willThe B updated when minimum is denoted as B ', Wherein, | | | |FFor the F- norm sign for taking matrix,In 2 indicate squared symbols, bijIt indicates The corresponding binary coded value of i-th of data jth dimension in original high dimensional data;
2. -3 are updated the dimension weight of all dimensions during current an iteration, wherein to wjIt is updated Process it is as follows: by B ' jth column column vector be denoted as βj, the jth column column vector in V is denoted as γj, then have B '={ β1, β2,…βj…,βc, V={ γ12,…γj…,γc, keep B ' constant, by right
,Minimize and solves to wjIt is updated, wherein | | | |2To take the 2- norm of matrix to accord with Number, it willThe w updated when minimumjIt is denoted as wj', by all dimensions updated during current an iteration Dimension weight arranged in sequence after obtained weight matrix be denoted as W';
2. -4 judge whether the number of iterations reaches the maximum number of iterations of setting, if not up to maximum number of iterations, enables W=W', B=B ', 2. -2 beginning next iteration process, Simultaneous Iteration number add 1 to return step, wherein W=W' and B=B ' In "=" be assignment;If reaching maximum number of iterations, current an iteration is updated in the process obtained W' as Current an iteration is updated obtained B ' in the process and is used as final binary coded matrix B " by final weight matrix W ";
3. being weighted quantization to each element in B " according to W ", the binary coded matrix Z after being weighted.
4. according to W " and B ", it obtainsQ' when minimum, as binary coding q " corresponding with q', It is searched in Z and the nearest row vector data of weighting functions of q ", row vector number that will be nearest with the weighting functions of q " According to corresponding original high dimensional data as final K-NN search as a result, completing the hash query process to q.

Claims (2)

1. a kind of hash query method solved based on weight towards higher-dimension big data, it is characterised in that the following steps are included:
1. obtaining the original High Dimensional Data Set X being made of n original high dimensional datas and given inquiry data q, X being the square of n × d dimension Battle array carries out dimensionality reduction to X using Principal Component Analysis, obtains low-dimensional vector set V corresponding with X,Its In, V is the matrix of n × c dimension, vijIndicate i-th of data jth dimension corresponding low-dimensional element vector in V in original high dimensional data Element reuses Principal Component Analysis and carries out dimensionality reduction to q, obtains low-dimensional vector q' corresponding with q;
2. final binary coded matrix B " and final weight matrix W are obtained by iteration ", detailed process is as follows:
2. -1 setting maximum number of iterations, gives initial binary encoder matrix B, B ∈ { -1,1 } at randomn×c, random given initial Weight matrix W, W=diag (w1, w2... wj..., wc), wherein wjIndicate the dimension weight of jth dimension;
2. -2 start iterative process, during current an iteration, holding W first is constant, by rightIt carries out It minimizes to solve and B is updated, it willThe B updated when minimum is denoted as B ', Wherein, | | | |FFor the F- norm sign for taking matrix,In 2 indicate squared symbols, bijIt indicates The corresponding binary coded value of i-th of data jth dimension in original high dimensional data;
2. -3 are updated the dimension weight of all dimensions during current an iteration, wherein to wjThe mistake being updated Journey is as follows: the column vector of the jth column in B ' is denoted as βj, the jth column column vector in V is denoted as γj’, then have B '={ β12,… βj…,βc, V={ γ12,…γj…,γc, keep B ' constant, by rightMinimize and solves to wj It is updated, wherein | | | |2It, will for the 2- norm sign for taking matrixThe w updated when minimumjIt is denoted as wj', the weight matrix obtained after the dimension weight arranged in sequence of all dimensions updated during current an iteration is remembered For W';
2. -4 judge whether the number of iterations reaches the maximum number of iterations of setting, if not up to maximum number of iterations, enables W= W', B=B ', return step 2. -2 start next iteration process, Simultaneous Iteration number adds 1, wherein in W=W' and B=B ' "=" is assignment;If reaching maximum number of iterations, current an iteration is updated to obtained W' in the process as final Current an iteration is updated obtained B ' in the process and is used as final binary coded matrix B " by weight matrix W ";
3. being weighted quantization to each element in B " according to W ", the binary coded matrix Z after being weighted;
4. according to W " and B ", it obtainsQ' when minimum, as binary coding q " corresponding with q', in Z It searches and the nearest row vector data of weighting functions of q ", row vector data pair that will be nearest with the weighting functions of q " The original high dimensional data answered is as final K-NN search as a result, completing the hash query process to q.
2. a kind of hash query method solved based on weight towards higher-dimension big data according to claim 1, special Sign is 2. maximum number of iterations that the step is set in -1 as 50 times.
CN201811317132.1A 2018-11-07 2018-11-07 Hash query method for high-dimensional big data based on weight solving Active CN109710607B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811317132.1A CN109710607B (en) 2018-11-07 2018-11-07 Hash query method for high-dimensional big data based on weight solving

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811317132.1A CN109710607B (en) 2018-11-07 2018-11-07 Hash query method for high-dimensional big data based on weight solving

Publications (2)

Publication Number Publication Date
CN109710607A true CN109710607A (en) 2019-05-03
CN109710607B CN109710607B (en) 2021-09-17

Family

ID=66254189

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811317132.1A Active CN109710607B (en) 2018-11-07 2018-11-07 Hash query method for high-dimensional big data based on weight solving

Country Status (1)

Country Link
CN (1) CN109710607B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115935200A (en) * 2023-01-12 2023-04-07 北京三维天地科技股份有限公司 Mass data similarity calculation method based on Hash and Hamming distance

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226585A (en) * 2013-04-10 2013-07-31 大连理工大学 Self-adaptation Hash rearrangement method for image retrieval
CN107341178A (en) * 2017-05-24 2017-11-10 北京航空航天大学 A kind of adaptive binary quantization Hash coding method and device
CN107871014A (en) * 2017-11-23 2018-04-03 清华大学 A kind of big data cross-module state search method and system based on depth integration Hash
US20180276528A1 (en) * 2015-12-03 2018-09-27 Sun Yat-Sen University Image Retrieval Method Based on Variable-Length Deep Hash Learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226585A (en) * 2013-04-10 2013-07-31 大连理工大学 Self-adaptation Hash rearrangement method for image retrieval
US20180276528A1 (en) * 2015-12-03 2018-09-27 Sun Yat-Sen University Image Retrieval Method Based on Variable-Length Deep Hash Learning
CN107341178A (en) * 2017-05-24 2017-11-10 北京航空航天大学 A kind of adaptive binary quantization Hash coding method and device
CN107871014A (en) * 2017-11-23 2018-04-03 清华大学 A kind of big data cross-module state search method and system based on depth integration Hash

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HAIYAN FU 等: ""Binary code reranking method with weighted hamming distance"", 《MULTIMEDIA TOOLS AND APPLICATIONS》 *
任艳多: ""大规模数据检索中基于哈希编码的量化技术综述"", 《无线数据通信》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115935200A (en) * 2023-01-12 2023-04-07 北京三维天地科技股份有限公司 Mass data similarity calculation method based on Hash and Hamming distance
CN115935200B (en) * 2023-01-12 2023-09-08 北京三维天地科技股份有限公司 Mass data similarity calculation method based on Hash He Hai clear distance

Also Published As

Publication number Publication date
CN109710607B (en) 2021-09-17

Similar Documents

Publication Publication Date Title
US20190385001A1 (en) Data extraction using neural networks
WO2020081867A1 (en) Semi-supervised person re-identification using multi-view clustering
CN112199532B (en) Zero sample image retrieval method and device based on Hash coding and graph attention machine mechanism
CN110929080B (en) Optical remote sensing image retrieval method based on attention and generation countermeasure network
CN109271486B (en) Similarity-preserving cross-modal Hash retrieval method
CN109948735B (en) Multi-label classification method, system, device and storage medium
CN113312505B (en) Cross-modal retrieval method and system based on discrete online hash learning
CN106777388B (en) Double-compensation multi-table Hash image retrieval method
CN109657112B (en) Cross-modal Hash learning method based on anchor point diagram
CN106033426A (en) Image retrieval method based on latent semantic minimum hash
Li et al. Exploiting hierarchical activations of neural network for image retrieval
CN104881449A (en) Image retrieval method based on manifold learning data compression hash
CN102469134A (en) IP (Internet Protocol) address search method and device
CN112256727B (en) Database query processing and optimizing method based on artificial intelligence technology
CN103473307A (en) Cross-media sparse Hash indexing method
CN116417093A (en) Drug target interaction prediction method combining transducer and graph neural network
CN109871379B (en) Online Hash nearest neighbor query method based on data block learning
CN112116950B (en) Protein folding identification method based on depth measurement learning
CN107220333B (en) character search method based on Sunday algorithm
CN105740428A (en) B+ tree-based high-dimensional disc indexing structure and image search method
CN109857892B (en) Semi-supervised cross-modal Hash retrieval method based on class label transfer
Chen et al. Multiple-instance ranking based deep hashing for multi-label image retrieval
CN111078952A (en) Cross-modal variable-length Hash retrieval method based on hierarchical structure
CN104731884A (en) Query method based on multi-feature fusion type multiple Hashtables
CN109710607A (en) A kind of hash query method solved based on weight towards higher-dimension big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20231221

Address after: Room 1601, 238 Jiangchang 3rd Road, Jing'an District, Shanghai 200040

Patentee after: Shanghai Haonashi Network Technology Co.,Ltd.

Address before: 230000 floor 1, building 2, phase I, e-commerce Park, Jinggang Road, Shushan Economic Development Zone, Hefei City, Anhui Province

Patentee before: Dragon totem Technology (Hefei) Co.,Ltd.

Effective date of registration: 20231221

Address after: 230000 floor 1, building 2, phase I, e-commerce Park, Jinggang Road, Shushan Economic Development Zone, Hefei City, Anhui Province

Patentee after: Dragon totem Technology (Hefei) Co.,Ltd.

Address before: 315211, Fenghua Road, Jiangbei District, Zhejiang, Ningbo 818

Patentee before: Ningbo University