CN107862061A - The index file method for building up and search method of a kind of database - Google Patents

The index file method for building up and search method of a kind of database Download PDF

Info

Publication number
CN107862061A
CN107862061A CN201711127179.7A CN201711127179A CN107862061A CN 107862061 A CN107862061 A CN 107862061A CN 201711127179 A CN201711127179 A CN 201711127179A CN 107862061 A CN107862061 A CN 107862061A
Authority
CN
China
Prior art keywords
index
contents value
cryptographic hash
literary name
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711127179.7A
Other languages
Chinese (zh)
Inventor
刘亚歌
杨宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Cct Software Information Co Ltd
Original Assignee
Shenzhen Cct Software Information Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Cct Software Information Co Ltd filed Critical Shenzhen Cct Software Information Co Ltd
Priority to CN201711127179.7A priority Critical patent/CN107862061A/en
Publication of CN107862061A publication Critical patent/CN107862061A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Index file method for building up and search method, the computer-readable recording medium of a kind of database, by its corresponding data, each contents value and their cryptographic Hash are formed any index under selected literary name section, allow in retrieval traversal index, first matched with indexing included cryptographic Hash, if the match is successful, the contents value matching under included literary name section is indexed again so that more efficient is compared in retrieval;Further, because the preset length of the cryptographic Hash of the contents value under literary name section in index is less than the length of the literary name section, so as to which the operand of matching can be efficiently reduced compared to conventional method, inquiry and recall precision are improved.

Description

The index file method for building up and search method of a kind of database
Technical field
The present invention relates to data storage and search field.
Background technology
Conventional retrieval method is to be carried out to indexing and inquiring about word string in the way of regular expression or lexcographical order are than size Retrieval.Under the scene in face of arranging the tall and thin type tables of data more than few row, conventional method can expend the substantial amounts of CPU time and exist In Byte.CompareTo operations, in the case of in face of mass data, simple inquiry can also consume long time.
The content of the invention
In view of the above-mentioned problems, the application provides a kind of the index file method for building up and search method of database.
According in a first aspect, provide a kind of index file method for building up of database in a kind of embodiment, including:
Choose several literary name sections of tables of data;
For the data of each storage, Hash meter is done respectively to contents value of the data under several described literary name sections Calculation is handled, and obtains the cryptographic Hash of the data each contents value under these literary name sections;
By the data, each contents value and their cryptographic Hash are indexed under these literary name sections;
The corresponding index of data each stored, these indexes form the index file.
In one embodiment, by the indexed forward part of cryptographic Hash of the contents value under these described literary name sections, by institute State the indexed rear part of contents value under these literary name sections.
In one embodiment, it is described to obtain the cryptographic Hash of the data each contents value under these literary name sections, it is to obtain the number According to the cryptographic Hash of the preset length of each contents value under these literary name sections.
In one embodiment, the preset length of the cryptographic Hash of the contents value under literary name section is less than the length of the literary name section.
In one embodiment, described several literary name sections for choosing tables of data, including choose several keys of tables of data Or conventional literary name section.
According to second aspect, a kind of search method of database is provided in a kind of embodiment, including:
Read search condition;
Travel through the index in index file;
When traversing any index, first compare search condition and whether matched with the cryptographic Hash that the index is included;
If search condition mismatches with the cryptographic Hash that the index is included, continue to travel through next index;If retrieve bar The Hash values match that part is included with the index, then compare the contents value under the literary name section that search condition is included with the index again Whether match;
If the contents value under the literary name section that search condition is included with the index mismatches, continue to travel through next rope Draw;If the contents value under the literary name section that search condition is included with the index matches, the corresponding data of this index are obtained.
In one embodiment, the search condition includes comprising at least the contents value being queried;
The search method also includes:Calculate the cryptographic Hash for the contents value being queried in search condition;
Whether the relatively search condition matches with the cryptographic Hash that the index is included, including:Compare the content being queried Whether the cryptographic Hash of value matches with the cryptographic Hash that the index is included;
Whether the contents value under the literary name section that search condition is included with the index described relatively matches, including:Compare by Whether the contents value under the literary name section that the contents value of inquiry is included with the index matches;
In the corresponding data of this of the acquisition index, also by the data duplication of acquisition into a result set, and after The continuous next index of traversal.
In one embodiment, the search condition can also include literary name section belonging to the contents value that is queried, be queried At least one of contents value and its cryptographic Hash that is calculated position in the index.
In one embodiment, the data of the corresponding storage of one in index file index, any index by this its it is right The data answered each contents value and their cryptographic Hash under selected literary name section are formed.
According to the third aspect, a kind of computer-readable recording medium is provided in a kind of embodiment, it is characterised in that including journey Sequence, described program can be executed by processor to realize method as claimed in any one of claims 1-9 wherein.
Index file method for building up and search method, computer-readable storage medium according to the database of above-described embodiment Matter, by its corresponding data, each contents value and their cryptographic Hash are formed any index under selected literary name section, are made Obtaining can be in retrieval traversal index, and first the cryptographic Hash included with index is matched, if the match is successful, then is indexed and is included Literary name section under contents value matching so that retrieval compare more efficient;Further, due to the contents value under literary name section in index The preset length of cryptographic Hash is less than the length of the literary name section, so as to efficiently reduce the computing of matching compared to conventional method Amount, improve inquiry and recall precision.
Brief description of the drawings
Fig. 1 is a kind of flow chart of the index file method for building up of the database of embodiment;
Fig. 2 is a kind of flow chart of the search method of the database of embodiment.
Embodiment
The present invention is described in further detail below by embodiment combination accompanying drawing.Wherein different embodiments Middle similar component employs associated similar element numbers.In the following embodiments, many detailed descriptions be in order to The application is better understood.However, those skilled in the art can be without lifting an eyebrow recognize, which part feature It is dispensed, or can be substituted by other elements, material, method in varied situations.In some cases, this Shen Certain operations that please be related are not shown in the description or description, and this is the core in order to avoid the application by mistake More descriptions are flooded, and to those skilled in the art, be described in detail these associative operations be not it is necessary, they The general technology knowledge of description and this area in specification can completely understand associative operation.
In addition, feature described in this description, operation or feature can combine to form respectively in any suitable way Kind embodiment.Meanwhile each step in method description or action can also can be aobvious and easy according to those skilled in the art institute The mode carry out order exchange or adjustment seen.Therefore, the various orders in specification and drawings are intended merely to clearly describe a certain Individual embodiment, necessary order is not meant to be, wherein some sequentially must comply with unless otherwise indicated.
It is herein part institute serialization number itself, such as " first ", " second " etc., is only used for distinguishing described object, Without any order or art-recognized meanings.And " connection ", " connection " described in the application, unless otherwise instructed, include directly and It is indirectly connected with (connection).
Fig. 1 is refer to, discloses a kind of index file method for building up of database, including step S100 in one embodiment To step S120.
Step S100:Choose several literary name sections of tables of data.In one embodiment, if step S100 chooses tables of data Dry literary name section, including several crucial or conventional literary name sections of tables of data are chosen, so can further it improve follow-up Retrieval and the efficiency of inquiry.
Step S110:For the data of each storage, to the data contents value under several described literary name sections point Hash calculation processing is not done, obtains the cryptographic Hash of the data each contents value under these literary name sections.In one embodiment, step S110 obtains the cryptographic Hash of the data each contents value under these literary name sections, is to obtain the data each content under these literary name sections The cryptographic Hash of the preset length of value.In one embodiment, the preset length of the cryptographic Hash of the contents value in index under literary name section is small In the length of the literary name section.In one embodiment, the length of the cryptographic Hash of contents value of each data under same literary name section can To be identical, the length of the cryptographic Hash of contents value of the same data under different literary name sections can be same or different.
Step S120:By the data, each contents value and their cryptographic Hash are indexed under these literary name sections.One In embodiment, step S120 is included the indexed forward part of cryptographic Hash of the contents value under these above-mentioned literary name sections, by institute State the indexed rear part of contents value under these literary name sections.In one embodiment, under literary name section the cryptographic Hash of contents value and Position of the contents value in any index under these literary name sections is all identical.For example, choosing literary name section A, B and C, there are two Data, contents value of first data under literary name section A, B and C are respectively a1, b1, c1, the second data in literary name section A, B and Contents value under C is respectively a2, b2, c2.
If the index corresponding to the first data is:
A1 cryptographic Hash _ b1 cryptographic Hash _ c1 cryptographic Hash a1_b1_c1;
Index so corresponding to the second data is:
A2 cryptographic Hash _ b2 cryptographic Hash _ c2 cryptographic Hash a2_b2_c2;
It can be seen that the position of contents value and its cryptographic Hash in the index under the same field of different pieces of information is identical.
Wherein, the corresponding index of data each stored, these indexes form above-mentioned index file.
Fig. 2 is refer to, discloses a kind of search method of database, including step S200 to S250 in one embodiment.
Step S200:Read search condition.
Step S210:Travel through the index in index file.
Step S220:When traversing any index, first compare cryptographic Hash that search condition and the index included whether Match somebody with somebody, if search condition mismatches with the cryptographic Hash that the index is included, carry out step S250;If search condition and the index institute Comprising Hash values match, then carry out step S230.
Step S230:Whether the contents value compared under the literary name section that search condition is included with the index matches.If retrieval Contents value under the literary name section that condition is included with the index mismatches, then carries out step S250;If search condition and the index Comprising literary name section under contents value matching, then carry out step S240.
Step S240:Obtain the corresponding data of this index.In one embodiment, step S240 can also include:Obtain After taking the corresponding data of this index, also by the data duplication of acquisition into a result set, and step S250 is carried out.
Step S250:Continue to travel through next index.Stop after having traveled through each index in index file.
Specifically, in one embodiment, search condition includes the contents value being queried.In one embodiment, search condition The literary name section belonging to the contents value that is queried, the contents value being queried and its cryptographic Hash that is calculated can also be included to index In position.In one embodiment, the literary name section according to belonging to the contents value being queried, can obtain the contents value that is queried and Its position of cryptographic Hash being calculated in the index..Correspondingly, search method also includes calculating what is be queried in search condition The cryptographic Hash of contents value;Therefore, whether the cryptographic Hash that step S220 compares search condition with the index is included matches, including than Whether the cryptographic Hash compared with the contents value being queried in search condition matches with the cryptographic Hash that the index is included;In an embodiment In, if search condition includes the position of the cryptographic Hash for the contents value being queried in the index, it can directly compare retrieval bar Whether the cryptographic Hash for the contents value being queried in part matches with the cryptographic Hash of corresponding position in the index;Step S230 compares inspection Whether the contents value under the literary name section that rope condition is included with the index matches, including:It is interior to compare being queried for search condition Whether the contents value under the literary name section that capacitance is included with the index matches;In one embodiment, if search condition includes quilt The position of the contents value of inquiry in the index, then it can directly compare the contents value being queried in search condition and phase in the index Answer whether the contents value of opening position matches.
The present invention is illustrated by the example of a reality again below.Might as well using following such a simple tables of data as Example illustrates the thinking of the present invention.Following table is office worker's situation table of certain company.
Name Sex Age Educational background Hiring date Wage Marriage situation
AA Man 26 Undergraduate course 2015 8000 It is married
BB Man 23 Undergraduate course 2015 5000 It is unmarried
CC Female 43 Undergraduate course 2010 20000 It is married
DD Man 34 Doctor 2012 20000 It is married
EE Man 25 Master 2015 10000 It is unmarried
Each field of the first behavior table in table, the second row to the data of the 6th behavior five, one is contained per a data The situation of individual office worker.
For example, we can choose name, age and wage these three literary name sections in above-mentioned tables of data.With first number Exemplified by, when establishing the index of the data, the first content to the data under name, age and wage these three literary name sections Value does cryptographic Hash calculating processing respectively, i.e., respectively AA, 26 and 8000 are done with cryptographic Hash calculating processing, obtains the first data in surname AA, 26 and 8000 cryptographic Hash under name, age and wage these three literary name sections.When calculating cryptographic Hash, such as calculate AA Kazakhstan During uncommon value, the cryptographic Hash of the length less than the literary name section name belonging to AA can be calculated.
By above-mentioned processing, index can be expressed as corresponding to the first data:
Cryptographic Hash _ AA_26_8000 of cryptographic Hash _ 8000 of AA cryptographic Hash _ 26.
Similarly, the second data can be obtained to the index of Article 5 data.
These indexes constitute the index file of the tables of data.
In retrieval and inquiry, search condition is first read, as described above, search condition includes the contents value being queried, is entered One step can also include literary name section belonging to the contents value that is queried, the contents value being queried and its cryptographic Hash that is calculated and exist At least one of position in index.
It might as well be illustrated by search condition including the contents value AA being queried and its exemplified by affiliated literary name section " name ".Root The contents value AA being queried and its cryptographic Hash can be obtained respectively in the 4th and the 1st position of index according to literary name section " name ".Again Calculate contents value AA cryptographic Hash, then travel through the index in index file, first compare AA cryptographic Hash and index in the 1st position Whether the cryptographic Hash put matches (equal), if cryptographic Hash mismatches, then travels through next index, if Hash values match, Whether the contents value for further comparing contents value AA with the 4th position of the index matches, if contents value mismatches, then time Next index is gone through, until traveling through complete index file, if contents value matches, obtains data corresponding to the index, then Can also be by the data duplication into a result set, and continue to travel through next index, until traveling through complete index file.
It will be understood by those skilled in the art that all or part of function of various methods can pass through in above-mentioned embodiment The mode of hardware is realized, can also be realized by way of computer program.When all or part of function in above-mentioned embodiment When being realized by way of computer program, the program can be stored in a computer-readable recording medium, and storage medium can With including:Read-only storage, random access memory, disk, CD, hard disk etc., it is above-mentioned to realize that the program is performed by computer Function.For example, by program storage in the memory of equipment, when passing through computing device memory Program, you can in realization State all or part of function.In addition, when in above-mentioned embodiment all or part of function realized by way of computer program When, the program can also be stored in the storage mediums such as server, another computer, disk, CD, flash disk or mobile hard disk In, by download or copying and saving into the memory of local device, or version updating is carried out to the system of local device, when logical When crossing the program in computing device memory, you can realize all or part of function in above-mentioned embodiment.
Use above specific case is illustrated to the present invention, is only intended to help and is understood the present invention, not limiting The system present invention.For those skilled in the art, according to the thought of the present invention, can also make some simple Deduce, deform or replace.

Claims (10)

  1. A kind of 1. index file method for building up of database, it is characterised in that including:
    Choose several literary name sections of tables of data;
    For the data of each storage, contents value of the data under several described literary name sections is done at Hash calculation respectively Reason, obtains the cryptographic Hash of the data each contents value under these literary name sections;
    By the data, each contents value and their cryptographic Hash are indexed under these literary name sections;
    The corresponding index of data each stored, these indexes form the index file.
  2. 2. index file method for building up as claimed in claim 1, it is characterised in that by the contents value under these described literary name sections The indexed forward part of cryptographic Hash, by the contents value under these described literary name sections it is indexed after part.
  3. 3. index file method for building up as claimed in claim 1 or 2, it is characterised in that the described data that obtain are in these tables The cryptographic Hash of each contents value under field, it is to obtain the cryptographic Hash of data preset length of each contents value under these literary name sections.
  4. 4. index file method for building up as claimed in claim 3, it is characterised in that the Kazakhstan of the contents value in index under literary name section The preset length of uncommon value is less than the length of the literary name section.
  5. 5. index file method for building up as claimed in claim 1, it is characterised in that described several literary names for choosing tables of data Section, including choose several crucial or conventional literary name sections of tables of data.
  6. A kind of 6. search method of database, it is characterised in that including:
    Read search condition;
    Travel through the index in index file;
    When traversing any index, first compare search condition and whether matched with the cryptographic Hash that the index is included;
    If search condition mismatches with the cryptographic Hash that the index is included, continue to travel through next index;If search condition with The Hash values match that the index is included, then whether compare contents value under the literary name section that search condition and the index are included again Matching;
    If the contents value under the literary name section that search condition is included with the index mismatches, continue to travel through next index;If Contents value under the literary name section that search condition is included with the index matches, then obtains the corresponding data of this index.
  7. 7. search method as claimed in claim 6, it is characterised in that:
    The search condition comprises at least the contents value being queried;
    The search method also includes:Calculate the cryptographic Hash for the contents value being queried in search condition;
    Whether the relatively search condition matches with the cryptographic Hash that the index is included, including:Compare being queried of being calculated The cryptographic Hash of contents value whether matched with the cryptographic Hash that the index is included;
    Whether the contents value under the literary name section that search condition is included with the index described relatively matches, including:Compare retrieval bar Whether the contents value under the literary name section that the contents value being queried of part is included with the index matches;
    In the corresponding data of this of the acquisition index, also by the data duplication of acquisition into a result set, and continuation time Go through next index.
  8. 8. search method as claimed in claim 7, it is characterised in that the search condition can also include the content being queried At least one of the position of literary name section, the contents value being queried belonging to value and its cryptographic Hash that is calculated in the index.
  9. 9. search method as claimed in claim 6, it is characterised in that what corresponding one an of index in index file stored Data, by its corresponding data, each contents value and their cryptographic Hash are formed any index under selected literary name section.
  10. A kind of 10. computer-readable recording medium, it is characterised in that including program, described program can be executed by processor with Realize method as claimed in any one of claims 1-9 wherein.
CN201711127179.7A 2017-11-15 2017-11-15 The index file method for building up and search method of a kind of database Pending CN107862061A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711127179.7A CN107862061A (en) 2017-11-15 2017-11-15 The index file method for building up and search method of a kind of database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711127179.7A CN107862061A (en) 2017-11-15 2017-11-15 The index file method for building up and search method of a kind of database

Publications (1)

Publication Number Publication Date
CN107862061A true CN107862061A (en) 2018-03-30

Family

ID=61701802

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711127179.7A Pending CN107862061A (en) 2017-11-15 2017-11-15 The index file method for building up and search method of a kind of database

Country Status (1)

Country Link
CN (1) CN107862061A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113127421A (en) * 2021-04-01 2021-07-16 山东英信计算机技术有限公司 Method and equipment for searching file content in storage system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103870511A (en) * 2012-12-18 2014-06-18 ***股份有限公司 Shared memory-based information inquiring equipment and method
CN103914463A (en) * 2012-12-31 2014-07-09 北京新媒传信科技有限公司 Method and device for retrieving similarity of picture messages
CN104866502A (en) * 2014-02-25 2015-08-26 深圳市中兴微电子技术有限公司 Data matching method and device
US20160105397A1 (en) * 2013-03-15 2016-04-14 International Business Machines Corporation Firewall Packet Filtering
CN105847062A (en) * 2016-05-06 2016-08-10 汉柏科技有限公司 Log aggregation method and device
CN106777156A (en) * 2016-12-20 2017-05-31 兰州大学淮安高新技术研究院 A kind of design patent figure retrieving method based on figure Hash

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103870511A (en) * 2012-12-18 2014-06-18 ***股份有限公司 Shared memory-based information inquiring equipment and method
CN103914463A (en) * 2012-12-31 2014-07-09 北京新媒传信科技有限公司 Method and device for retrieving similarity of picture messages
US20160105397A1 (en) * 2013-03-15 2016-04-14 International Business Machines Corporation Firewall Packet Filtering
CN104866502A (en) * 2014-02-25 2015-08-26 深圳市中兴微电子技术有限公司 Data matching method and device
CN105847062A (en) * 2016-05-06 2016-08-10 汉柏科技有限公司 Log aggregation method and device
CN106777156A (en) * 2016-12-20 2017-05-31 兰州大学淮安高新技术研究院 A kind of design patent figure retrieving method based on figure Hash

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
桂思强 编: "《ACCESS数据库设计基础》", 30 September 2003 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113127421A (en) * 2021-04-01 2021-07-16 山东英信计算机技术有限公司 Method and equipment for searching file content in storage system

Similar Documents

Publication Publication Date Title
US9411840B2 (en) Scalable data structures
CN108897761B (en) Cluster storage method and device
US9047330B2 (en) Index compression in databases
JP4785833B2 (en) Database management system with persistent and user accessible bitmap values
CN109033101B (en) Label recommendation method and device
US20240126817A1 (en) Graph data query
US9442694B1 (en) Method for storing a dataset
US20180144061A1 (en) Edge store designs for graph databases
US10248680B2 (en) Index management
CN104850565B (en) A kind of metadata management method based on K-V storage systems
Fraczek et al. Comparative analysis of relational and non-relational databases in the context of performance in web applications
CA3059929C (en) Text searching method, apparatus, and non-transitory computer-readable storage medium
CN109522271B (en) Batch insertion and deletion method and device for B + tree nodes
US8396882B2 (en) Systems and methods for generating issue libraries within a document corpus
CN112115227A (en) Data query method and device, electronic equipment and storage medium
CN102169491B (en) Dynamic detection method for multi-data concentrated and repeated records
CN103955514A (en) Image feature indexing method based on Lucene inverted index
US11853279B2 (en) Data storage using vectors of vectors
US9047363B2 (en) Text indexing for updateable tokenized text
US20210026820A1 (en) Techniques for database entries de-duplication
CN104408128B (en) A kind of reading optimization method indexed based on B+ trees asynchronous refresh
CN107862061A (en) The index file method for building up and search method of a kind of database
US20180144060A1 (en) Processing deleted edges in graph databases
US11422998B2 (en) Data management system, data management device, data management method, and storage medium
WO2016119508A1 (en) Method for recognizing large-scale objects based on spark system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180330

RJ01 Rejection of invention patent application after publication