CN105868355A - Large-scale multimedia data spatial index method - Google Patents

Large-scale multimedia data spatial index method Download PDF

Info

Publication number
CN105868355A
CN105868355A CN201610187012.9A CN201610187012A CN105868355A CN 105868355 A CN105868355 A CN 105868355A CN 201610187012 A CN201610187012 A CN 201610187012A CN 105868355 A CN105868355 A CN 105868355A
Authority
CN
China
Prior art keywords
node
data
tree
multimedia data
index method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610187012.9A
Other languages
Chinese (zh)
Inventor
李晖
陈梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou Youlian Borui Technology Co Ltd
Guizhou University
Original Assignee
Guizhou Youlian Borui Technology Co Ltd
Guizhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou Youlian Borui Technology Co Ltd, Guizhou University filed Critical Guizhou Youlian Borui Technology Co Ltd
Priority to CN201610187012.9A priority Critical patent/CN105868355A/en
Publication of CN105868355A publication Critical patent/CN105868355A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/41Indexing; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a large-scale multimedia data spatial index method. A node is split into three node bodies through a scientific splitting algorithm, m/M is selected to be 30-40%, and data needing to be processed is placed in a processor low in occupancy rate to be processed. By means of the splitting algorithm, the node is split into the three node bodies, overlapped areas of the node bodies are allocated into one node body, multi-path search is reduced during data retrieval, and query efficiency is improved. Meanwhile, the capacity of the node is increased, m/M is selected to be 30-40%, a good node utilization rate can be kept, and the node space utilization rate is fully used. Query of large-scale multimedia data is achieved.

Description

A kind of space index method of large scale multimedia data
Technical field
The present invention relates to a kind of space index method, a kind of large scale multimedia data Space index method.
Background technology
For multimedia messages, computer be difficult to judge as the mankind between multimedia object be No have similarity.Therefore, people, by extracting the feature of multimedia messages, i.e. use feature The method of vector judges whether have similarity between them.Characteristic vector is typically one group high The set of vectors of position, in order to effectively extract large scale multimedia information, is generally directed to big The characteristic vector of scale multimedia data object sets up its spatial index, realizes multimedia messages The inquiry of greater efficiency.In multimedia information retrieval, multimedia object is often represented as one The characteristic vector of individual higher-dimension, the similarity between two multimedia objects depends on two corresponding height Distance (conventional Euclidean distance) between dimensional feature vector.Thus multimedia retrieval turns Become calculating the calculating of the spacing of set of eigenvectors and characteristic vector to be checked in data base.
Spatial index is a kind of data organized according to data space distribution characteristics and store data Structure.Spatial index is in large scale multimedia market demand, e.g., and SR-Tree, A-Tree Deng, there is substantial amounts of overlapping region between index node, and internodal overlapping region be the biggest, Multichannel during object retrieval can be caused to search increase, thus reduce large scale multimedia data Search efficiency.
In order to solve the problems referred to above, the present invention proposes the space of a kind of large scale multimedia data Indexing means (KSR-Tree).
Summary of the invention
It is an object of the invention to, it is provided that the space index method of a kind of large scale multimedia data. The method uses splitting algorithm can inquire about large scale multimedia data, keeps joint very well Point utilization rate, and it is effectively improved search efficiency, also make the capacity of node storage data Increase.
For solving above-mentioned technical problem, the technical scheme that the present invention provides is as follows: a kind of extensive The space index method of multi-medium data: by utilizing splitting algorithm that one node split is become three Individual, choose m/M is 30%-40% simultaneously, then need to data to be processed to be placed on occupancy few Node on process.
In the space index method of aforesaid large scale multimedia data, described splitting algorithm Step is,
A., when interstitial content reaches M+1 or M+2, whether first decision node is to overflow for the first time, The most then reinsert splitting algorithm, postpone the division of node, otherwise, then split vertexes;
The spilling of node occurs after B. reinserting again, then node needs division;From node In M+1 or M+2 data object in, arbitrarily select three objects as in initial clustering The heart;
C. each data object distance to three initial cluster centers is calculated, initial by three Cluster centre is respectively divided the center of each data object of its nearest neighbours;
Recalculate the average of the data object divided the most again;
E. iteration C-D two step is until new average is equal with former average or is less than appointment threshold value;With This is as three intermediate nodes after division.
In the space index method of aforesaid large scale multimedia data, described threshold value is 0.0001。
Compared with prior art, the present invention is by utilizing splitting algorithm that one node split is become three Individual node, is divided into the overlapping region of node in one node, during minimizing data retrieval Multichannel is searched, and improves search efficiency, makes the capacity of node increase simultaneously;Choose m/M For 30%-40%, fine Duty-circle can be kept, the most sufficiently make use of node space profit By rate;Achieve the inquiry to large scale multimedia data.
KSR-Tree and SR-Tree of the application, A-Tree have been carried out a series of by applicant Contrast experiment, analysis of experimental data is as follows:
Experimental situation
1. hardware environment: processor AMD 6core, internal memory 16G, hard disk 2T
2. operating system platform: Ubuntu 12.04-64 position
3. programmed environment: Eclipse for C/C++
Testing scheme
Verified the effectiveness of this spatial index algorithm by experiment, with other, there is representative simultaneously The Spatial Data Index Technology of property, such as the comparison of SR-Tree, A-Tree.Reality in performance evaluation Test middle employing real data both to be from obtaining micro-in the media computation group of Microsoft Research, Asia The soft real large scale multimedia image data set that must answer image and video search engine MSRA-MM2.0 data set.
High dimensional feature vector data used in experiment include 128 dimension Wavelet Texture and 256 dimension RGB feature two kinds, they are both from according to Bing Images photographic search engine The actual picture data included, data total amount is 1,000,000.100 pictures institutes therein are right The characteristic vector answered is selected and is used as query set.In an experiment, by comparative analysis data set, The parameters such as k value, page size index based on KSR-Tree, SR-Tree and A-Tree KNN retrieve the performance i.e. impact of average response time.
Adjust the impact after data set size
In an experiment, we are dimensioned to 200,000,400,000,800,000,100 data set Ten thousand, the dimension of data is respectively 128 dimensions and 256 dimensions.Experimental result is as shown in Fig. 2-Fig. 7.
Test result indicate that, A-Tree, when data volume is 200,000, achieves preferably inspection Suo Xiaoguo.This is because this spatial index introduces characteristic vector approximation table based on compression thought Show technology, it is hereby achieved that bigger fan-in and fan-out.But A-Tree is owing to have employed Compress technique, adds the process of a decoding, and needs to tie with reference to its father node and child The position of point, but when data volume increases, its recall precision is on the contrary not as KSR-Tree. Therefore, although always A-Tree retrieval effectiveness under low data bulk is pretty good, but in multimedia In retrieval when data are bigger, the response time to kNN inquiry of A-Tree is not special Preferable.See again its time of SR-Tree always higher than KSR-Tree, this be due to In higher dimensional space, the index divided based on space can cause a large amount of weights in search volume is expanded Folded region and the access of data object and analysis.
On the average response time of inquiry, kNN based on KSR-Tree in this paper Retrieval achieves best result on response time.Main cause is: (1) is the most On media data, data volume is big, and dimension is higher.Cause the overlapping region between node more, And the divisional mode of KSR-Tree makes to which reduce multichannel and searches.(2) cut carrying out kNN During branch, owing to have employed node split algorithm in this paper, just reduce when kNN beta pruning Within to hunting zone, respective nodes repeat inquiry.
Adjust the impact after K value size
The impact on kNN based on KSR-Tree retrieval performance of this experimental evaluation k value.Number Being dimensioned to 1,000,000 according to collection, the dimension of data is respectively 128 dimensions and 256 dimensions.Experiment knot Fruit is as shown in figures s-11.
From Fig. 8-Fig. 9 it can be seen that although as the increase of k value, three kinds of index technologies Average response time strains big mutually, but it can be seen that the increase of KSR-tree is minimum , and its response time is also best.KSR-Tree on response time still It is better than other two kinds of index technologies.No matter it is at 128 dimensions or 256 dimensions, the sound of A-Tree It is always the highest between Ying Shi.SR-Tree is the most poor, and this is that search during kNN retrieval is initial Radius is relatively big and overlap between node is higher causes.
It can be seen that along with the increase of data set, response time also exists from Figure 10-Figure 11 Ceaselessly increase.But under same data set, the amplitude of variation of KSR-Tree is less.With K value the biggest, its response time there has also been a little to be increased.This is owing to expanding in search volume Zhang Zhong, result in the regions to a large amount of overlaps and the access of data object and analysis, thus increases Time of response.
Adjust the impact after Page size size
The change of the experimental evaluation page size impact on the recall precision of KSR-Tree, k Value is set to 100, and page size is dimensioned to 32k, 64k and 128k.Experimental result As shown in Figure 12-Figure 13.
For KSR-Tree, increase page size, it is meant that have bigger Fanout, can accommodate more child's data inside the most each node, thus effectively will weight Folded region comprises into a node.
Going out from the experimental results, tie up 128, page size is set to retrieval during 64k Efficiency this be best.Tieing up 256, page size is set to retrieval effect during 128k Rate this be the most stable, always be two page size less than other.Based on KSR-Tree KNN retrieval be minimum on response time.
Split vertexes number is selected different numerical value to carry out the impact of response time by applicant Test, the nodes after division selects the experimental result of (3,4,5,6) as shown in figure 14. In the case of 128 dimension data, the page size of index node is 64KB, and data set is from 200,000 Bar to 1,000,000.When vertical coordinate position is by inquiry, the response time of inquiry, unit is the second. Response time curve is the response time that nodes selects (3,4,5,6) the most successively, As seen from the figure, the when that the nodes after division being chosen as 3, response time is the shortest, and performance is It is better than 4,5,6.
Split vertexes number is selected different numerical value to carry out the impact of response time by applicant Test, the nodes after division selects the experimental result of (3,4,5,6) as shown in figure 15, In the case of 256 dimension data, the page size of index node is 64KB, and data set is from 200,000 Bar to 1,000,000.When vertical coordinate position is by inquiry, the response time of inquiry, unit is the second. Response time curve is the response time that nodes selects (3,4,5,6) the most successively, As seen from the figure, the when that the nodes after division being chosen as 3, response time is the shortest, and performance is It is better than 4,5,6.
Being found out by Figure 14-15, the when that the nodes after splitting being chosen as 3, response time is the shortest.
Accompanying drawing explanation
Fig. 1 is the splitted construction figure of the present invention;
Fig. 2 is 128 dimensions, and when k is 20, data set size is respectively to KSR-Tree, SR-Tree Impact with A-Tree;Wherein, abscissa is data set size, and vertical coordinate is response time;
Fig. 3 is 256 dimensions, and when k is 20, data set size is respectively to KSR-Tree, SR-Tree Impact with A-Tree;Wherein, abscissa is data set size, and vertical coordinate is response time;
Fig. 4 is 128 dimensions, and when k is 50, data set size is respectively to KSR-Tree, SR-Tree Impact with A-Tree;Wherein, abscissa is data set size, and vertical coordinate is response time;
Fig. 5 is 256 dimensions, and when k is 50, data set size is respectively to KSR-Tree, SR-Tree Impact with A-Tree;Wherein, abscissa is data set size, and vertical coordinate is response time;
Fig. 6 is 128 dimensions, and when k is 100, data set size is respectively to KSR-Tree, SR-Tree Impact with A-Tree;Wherein, abscissa is data set size, and vertical coordinate is response time;
Fig. 7 is 256 dimensions, and when k is 100, data set size is respectively to KSR-Tree, SR-Tree Impact with A-Tree;Wherein, abscissa is data set size, and vertical coordinate is response time;
Fig. 8 is 128 dimensions, and data set size is 1,000,000, k value respectively to KSR-Tree, The impact of SR-Tree and A-Tree;Wherein, abscissa is k value, when vertical coordinate is for response Between;
Fig. 9 is 256 dimensions, and data set size is 1,000,000, k value respectively to KSR-Tree, The impact of SR-Tree and A-Tree;Wherein, abscissa is k value, when vertical coordinate is for response Between;
Figure 10 is k value variation diagram under 128 dimensions;Wherein, abscissa is data set size, vertical Coordinate is response time;
Figure 11 is k value variation diagram under 256 dimensions;Wherein, abscissa is data set size, vertical Coordinate is response time;
Figure 12 is 128 dimensions, the different page size impacts on KSR-Tree;Wherein, horizontal Coordinate is data set size, and vertical coordinate is response time;
Figure 13 is 256 dimensions, the different page size impacts on KSR-Tree;Wherein, horizontal Coordinate is data set size, and vertical coordinate is response time;
Figure 14 is 128 dimensions, and page size is 64KB, data set from 200,000 to 1,000,000, Split vertexes number selects the impact on response time of the different numerical value;Wherein, abscissa is data Collection size, vertical coordinate is response time;
Figure 15 is 256 dimensions, and page size is 64KB, data set from 200,000 to 1,000,000, Split vertexes number selects the impact on response time of the different numerical value;Wherein, abscissa is data Collection size, vertical coordinate is response time.
Detailed description of the invention
Embodiment.The space index method of a kind of large scale multimedia data, is divided by utilization One node split is become three by algorithm, and choose m/M is 30%-40% simultaneously, then by needs The data of reason are placed on the node that occupancy is few and process.
Described m is the lower limit of object in node, and M is the upper limit of object in node.The value of m The 30-40% taking M is optimum, the preferable search efficiency of guarantee and insertion efficiency, the most both Fine Duty-circle can be kept, it is also possible to make full use of the space availability ratio of node.Judgement accounts for By the method for the few node of rate it is: two index nodes, each node can at most deposit 10 Record, if one of them node A has housed 6 records, another node B, has deposited Put 8 records, then the occupation rate of node A is less.
The step of described splitting algorithm is,
A., when interstitial content reaches M+1 or M+2, whether first decision node is to overflow for the first time, The most then reinsert splitting algorithm, postpone the division of node, otherwise, then split vertexes;
The spilling of node occurs after B. reinserting again, then node needs division;From node In M+1 or M+2 data object in, arbitrarily select three objects as in initial clustering The heart;
C. each data object distance to three initial cluster centers is calculated, initial by three Cluster centre is respectively divided the center of each data object of its nearest neighbours;
Recalculate the average of the data object divided the most again;
E. iteration C-D two step is until new average is equal with former average or is less than appointment threshold value;With This is as three intermediate nodes after division.Described threshold value is smaller floating more than 0 Count, be set as 0.0001 here.
As it is shown in figure 1, the node structure of KSR-Tree is similar with SR-Tree, simply at joint " one dividing into three " is used in the division of point, if the data object in leaf node C When reaching M+1, first judge whether that needs reinsert.As need not, this node is carried out point Split, after division, three new leaf nodes will be produced in intermediate node 1.In judging again Intermediate node 1, the need of dividing, if do not divided, divides stopping;Then will if desired for division Node 1 splits into three new intermediate nodes, finally repeats this process to root node root.
In the structure chart of KSR-Tree, if node C overflows, the number in node C According to object, node C, I and J after carrying out the division of " one dividing into three ", are obtained.Now Node 1 does not overflows, and division stops, and obtains the structure chart shown in Fig. 1.

Claims (3)

1. the space index method of large scale multimedia data, it is characterised in that: pass through Utilizing splitting algorithm that one node split is become three, choose m/M is 30%-40% simultaneously, then Process needing data to be processed to be placed on the node that occupancy is few.
The space index method of large scale multimedia data the most according to claim 1, It is characterized in that: the step of described splitting algorithm is,
A., when interstitial content reaches M+1 or M+2, whether first decision node is to overflow for the first time, The most then reinsert splitting algorithm, postpone the division of node, otherwise, then split vertexes;
The spilling of node occurs after B. reinserting again, then node needs division;From node In M+1 or M+2 data object in, arbitrarily select three objects as in initial clustering The heart;
C. each data object distance to three initial cluster centers is calculated, initial by three Cluster centre is respectively divided the center of each data object of its nearest neighbours;
Recalculate the average of the data object divided the most again;
E. iteration C-D two step is until new average is equal with former average or is less than appointment threshold value;With This is as three intermediate nodes after division.
The space index method of large scale multimedia data the most according to claim 2, It is characterized in that: described threshold value is 0.0001.
CN201610187012.9A 2016-03-29 2016-03-29 Large-scale multimedia data spatial index method Pending CN105868355A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610187012.9A CN105868355A (en) 2016-03-29 2016-03-29 Large-scale multimedia data spatial index method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610187012.9A CN105868355A (en) 2016-03-29 2016-03-29 Large-scale multimedia data spatial index method

Publications (1)

Publication Number Publication Date
CN105868355A true CN105868355A (en) 2016-08-17

Family

ID=56625121

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610187012.9A Pending CN105868355A (en) 2016-03-29 2016-03-29 Large-scale multimedia data spatial index method

Country Status (1)

Country Link
CN (1) CN105868355A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391601A (en) * 2017-06-30 2017-11-24 安徽四创电子股份有限公司 A kind of construction method of the high dimensional indexing of face feature vector
CN116796209A (en) * 2023-08-24 2023-09-22 北京安图生物工程有限公司 Data processing method for monitoring storage environment temperature of detection kit

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101299213A (en) * 2008-06-17 2008-11-05 中国地质大学(武汉) N-dimension clustering order recording tree space index method
CN101996242A (en) * 2010-11-02 2011-03-30 江西师范大学 Three-dimensional R-tree index expansion structure-based three-dimensional city model adaptive method
CN102682184A (en) * 2011-03-08 2012-09-19 中国科学院研究生院 Judgment method of fracture-pair intersection in random-distribution three-dimensional fracture network
CN102831241A (en) * 2012-09-11 2012-12-19 山东理工大学 Dynamic index multi-target self-adaptive construction method for product reverse engineering data
CN103092926A (en) * 2012-12-29 2013-05-08 深圳先进技术研究院 Multi-level mixed three-dimensional space index method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101299213A (en) * 2008-06-17 2008-11-05 中国地质大学(武汉) N-dimension clustering order recording tree space index method
CN101996242A (en) * 2010-11-02 2011-03-30 江西师范大学 Three-dimensional R-tree index expansion structure-based three-dimensional city model adaptive method
CN102682184A (en) * 2011-03-08 2012-09-19 中国科学院研究生院 Judgment method of fracture-pair intersection in random-distribution three-dimensional fracture network
CN102831241A (en) * 2012-09-11 2012-12-19 山东理工大学 Dynamic index multi-target self-adaptive construction method for product reverse engineering data
CN103092926A (en) * 2012-12-29 2013-05-08 深圳先进技术研究院 Multi-level mixed three-dimensional space index method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391601A (en) * 2017-06-30 2017-11-24 安徽四创电子股份有限公司 A kind of construction method of the high dimensional indexing of face feature vector
CN116796209A (en) * 2023-08-24 2023-09-22 北京安图生物工程有限公司 Data processing method for monitoring storage environment temperature of detection kit
CN116796209B (en) * 2023-08-24 2023-10-20 北京安图生物工程有限公司 Data processing method for monitoring storage environment temperature of detection kit

Similar Documents

Publication Publication Date Title
US8463045B2 (en) Hierarchical sparse representation for image retrieval
CN104199827B (en) The high dimensional indexing method of large scale multimedia data based on local sensitivity Hash
CN106933511B (en) Space data storage organization method and system considering load balance and disk efficiency
CN109086437A (en) A kind of image search method merging Faster-RCNN and Wasserstein self-encoding encoder
US20080071843A1 (en) Systems and methods for indexing and visualization of high-dimensional data via dimension reorderings
US8391355B2 (en) Method and device for online dynamic semantic video compression and video indexing
Cha et al. The GC-tree: a high-dimensional index structure for similarity search in image databases
CN105589938A (en) Image retrieval system and retrieval method based on FPGA
US20110208754A1 (en) Organization of Data Within a Database
US9442950B2 (en) Systems and methods for dynamic visual search engine
CN101866358A (en) Multidimensional interval querying method and system thereof
JPWO2013129580A1 (en) Approximate nearest neighbor search device, approximate nearest neighbor search method and program thereof
Lokoč et al. Ptolemaic indexing of the signature quadratic form distance
CN106095951B (en) Data space multi-dimensional indexing method based on load balancing and inquiry log
CN107545276A (en) The various visual angles learning method of joint low-rank representation and sparse regression
CN107451200B (en) Retrieval method using random quantization vocabulary tree and image retrieval method based on same
Kiranyaz et al. Hierarchical cellular tree: An efficient indexing scheme for content-based retrieval on multimedia databases
WO2010062445A1 (en) Predictive indexing for fast search
JP2002342136A (en) Device and method for deciding clustering coefficient for database by using block level sampling
CN112860937A (en) KNN and word embedding based mixed music recommendation method, system and equipment
CN105868355A (en) Large-scale multimedia data spatial index method
Hua et al. SamMatch: a flexible and efficient sampling-based image retrieval technique for large image databases
CN114972506B (en) Image positioning method based on deep learning and street view image
Zacharatou et al. Efficient bundled spatial range queries
Kriegel et al. The performance of object decomposition techniques for spatial query processing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160817

RJ01 Rejection of invention patent application after publication