CN111651372A - Flash retrieval method based on Hash search and storage medium - Google Patents

Flash retrieval method based on Hash search and storage medium Download PDF

Info

Publication number
CN111651372A
CN111651372A CN202010400678.4A CN202010400678A CN111651372A CN 111651372 A CN111651372 A CN 111651372A CN 202010400678 A CN202010400678 A CN 202010400678A CN 111651372 A CN111651372 A CN 111651372A
Authority
CN
China
Prior art keywords
data
flash
hash
flash storage
storage page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010400678.4A
Other languages
Chinese (zh)
Inventor
沈坤
王建国
聂思静
梁美红
陈秀琼
王敏敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei Sanjiang Aerospace Wanfeng Technology Development Co Ltd
Original Assignee
Hubei Sanjiang Aerospace Wanfeng Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei Sanjiang Aerospace Wanfeng Technology Development Co Ltd filed Critical Hubei Sanjiang Aerospace Wanfeng Technology Development Co Ltd
Priority to CN202010400678.4A priority Critical patent/CN111651372A/en
Publication of CN111651372A publication Critical patent/CN111651372A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a Flash retrieval method based on Hash search and a storage medium. The method comprises the following steps: receiving data to be stored, wherein each piece of data has a unique key value, and calculating the hash value of each piece of data according to a pre-constructed hash function and the key value of each piece of data; creating an index table in Flash, wherein the hash value of each piece of data represents the index position of the data in the index table, and establishing the mapping relation between the index position in the index table and a Flash storage page; and determining a Flash storage page of each piece of data according to the hash value of each piece of data and the mapping relation, and storing the data into the corresponding Flash storage page. The invention has the advantage of high retrieval efficiency.

Description

Flash retrieval method based on Hash search and storage medium
Technical Field
The invention belongs to the technical field of data retrieval, and particularly relates to a Flash retrieval method based on hash search and a storage medium.
Background
In the design of embedded products, it is usually necessary to store some data such as configuration parameters, operation states, operation results, etc. Moreover, the data needs to be stored for a long time and cannot be lost when power is lost. The efficiency of retrieving and storing data directly affects the performance of the product and even determines the success or failure of the product design.
Common search algorithms include sequential search, binary search, and binary tree search. For Flash retrieval on an embedded single chip microcomputer, the storage and available space are limited, so the method is not suitable for a binary sequencing tree, meanwhile, the retrieval speed is required, and when the data size of storage and retrieval is large, sequential search and binary search are not suitable for time requirements.
Disclosure of Invention
Aiming at least one defect or improvement requirement in the prior art, the invention provides a Flash retrieval method and a storage medium based on Hash search, which can enable more data to be stored in a limited storage space in Flash and can also ensure the high efficiency of data retrieval.
In order to achieve the above object, according to a first aspect of the present invention, there is provided a Flash retrieval method based on hash lookup, including:
receiving data to be stored, wherein each piece of data has a unique key value, and calculating the hash value of each piece of data according to a pre-constructed hash function and the key value of each piece of data;
creating an index table in Flash, wherein the hash value of each piece of data represents the index position of the data in the index table, and establishing the mapping relation between the index position in the index table and a Flash storage page;
and determining a Flash storage page of each piece of data according to the hash value of each piece of data and the mapping relation, and storing the data to the corresponding Flash storage page.
Preferably, if the hash values calculated by the multiple copies of data are the same, the index positions of the multiple copies of data in the index table are the same, and the multiple copies of data are sequentially stored in the Flash storage page corresponding to the index positions.
Preferably, the multiple data sets are stored in the Flash storage page corresponding to the index position in sequence, when the latter data set is stored, if the Flash storage page corresponding to the index position has no spare storage space, a blank Flash storage page is searched, the latter data set is stored in the blank Flash storage page, and a link relationship between the Flash storage page of the latter data set and the Flash storage page corresponding to the index position is established.
Preferably, the Flash retrieval method further includes:
receiving a data query request, wherein the data query request comprises a key value of data to be queried;
calculating the hash value of the data to be inquired according to the key value of the data to be inquired and the hash function;
and acquiring an index position to be inquired in the index table according to the hash value of the data to be inquired, searching the data to be inquired on a Flash storage page corresponding to the index position and returning an inquiry result.
Preferably, if the data to be queried is not found in the Flash storage page corresponding to the index position of the data to be queried and the Flash storage page corresponding to the index position of the data to be queried is linked to another Flash storage page, continuing to search the data to be queried in the other Flash storage page and returning a query result.
Preferably, the Flash retrieval method further includes:
receiving a data deleting request, wherein the data deleting request comprises a key value of data to be deleted;
calculating the hash value of the data to be deleted according to the key value of the data to be deleted and the hash function;
and acquiring the index position of the data to be deleted in the index table according to the hash value of the data to be deleted, and inquiring and deleting the data to be deleted on a Flash storage page corresponding to the index position.
Preferably, if the Flash storage page where the data to be deleted is located is empty after the data to be deleted is deleted, the link relation of other Flash storage pages linked to the Flash storage page where the data to be deleted is located is adjusted.
According to a second aspect of the invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements any of the methods described above.
In general, compared with the prior art, the method adopts a Hash search algorithm, can store more data in a limited storage space in Flash, can ensure the high efficiency of data retrieval, reduces the response time of a system, improves the efficiency of data storage and retrieval, and is very suitable for the special environment of embedded data storage.
Drawings
FIG. 1 is a schematic diagram of a Flash retrieval method based on hash search according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a search flow of a Flash retrieval method based on hash search according to an embodiment of the present invention;
fig. 3 is a schematic view of an added flow of the Flash retrieval method based on hash lookup according to the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The Flash retrieval method based on the Hash search is suitable for embedded systems such as a single chip microcomputer and the like, adopts the basic idea of the Hash search, and specifically comprises the steps of S1-S3.
And S1, receiving data to be stored, wherein each piece of data has a unique key value, and calculating the hash value of each piece of data according to a pre-constructed hash function and the key value of each piece of data.
In one embodiment, the hash function is used to convert the key value into BCD code, then add each byte of the BCD code in sequence to obtain the final result, and then take the value of one byte of the result to obtain the hash value corresponding to the key value, and the hash value is relatively uniformly distributed.
S2, creating an index table in Flash, wherein the hash value of each piece of data represents the index position of the piece of data in the index table, and establishing the mapping relation between the index position in the index table and the Flash storage page.
And S3, determining the Flash storage page of each piece of data according to the hash value of each piece of data and the mapping relation, and storing the data into the corresponding Flash storage page.
When the data is stored, the data is divided into an index table and a storage area, and the 0 th to N th positions of the index table are hash values calculated by the hash function. The 0 th to N th positions in the index table have mapping relation with the Flash storage page. The actual data storage address is not on the index table but on the storage area. The position corresponding to the hash value in the index table is actually a zipper head, and the page number of the Flash storage page where the data is really stored is stored.
And if the hash values calculated by the multiple data copies are the same, the index positions of the multiple data copies in the index table are the same, and the multiple data copies are sequentially stored in the Flash storage page corresponding to the index positions. That is, if the hash values calculated from several data are the same, that is, the same "zipper" head is in the index table, they are stored in the same Flash storage page according to the storage sequence.
When the latter data is stored, if the Flash storage page corresponding to the index position has no spare storage space, searching a blank Flash storage page, storing the latter data in the blank Flash storage page, and establishing a link relation between the Flash storage page of the latter data and the Flash storage page corresponding to the index position. That is, when a certain page in Flash is full of the list, and the next data is also the same hash value calculated, that is, the next data is stored in the same page, another page is found to be stored, and a link is established through the index header.
The Flash retrieval method based on hash search according to the embodiment of the invention is specifically described below by taking a laboratory card punching access control embedded system as an example.
Assuming the requirements of a laboratory card punching access control embedded system: 1) at least 5000 parts of lists are required to be stored in Flash; 2) the list contents comprise card numbers, validity periods, authorities, passwords, user names and other information; 3) the corresponding list needs to be quickly found from Flash according to the card number, and the required time is millisecond.
(1) Concrete method for data storage
The storage method of 'Hash' table is characterized by that firstly, according to a key value uniquely correspondent to a portion of data its storage address, i.e. Hash value, can be calculated by means of pre-constructed Hash function, then the data can be directly taken out from said correspondent storage address. Therefore, the process of traversing search is avoided, and the search time is greatly saved.
The Key value is typically a Key value uniquely associated with a piece of data, such as a card number in this example, which is unique, and a list of card numbers. The card number is suitable as a list of key values. In this example, the key value is a card number, which is a number of 4 bytes or 8 bytes; here collectively considered to be an 8-byte number.
The method comprises the steps of constructing a hash function, wherein for a number with a long length, each digit of the number is random and uniform, and by utilizing the characteristic, the algorithm of the hash function converts the number into a BCD code, then each byte of the BCD code is added in sequence to obtain a final result, and then the value of one byte of the result is taken. The hash values thus obtained are relatively randomly evenly distributed. The hash function is as follows:
Figure BDA0002489329270000051
the hextobcd function is to convert the 8-byte number into a 10-byte BCD code, then add the 10 bytes in sequence, and finally assign the value to one-byte ucfind, that is, only take the value of one byte. The value is a storage address, and it can be seen that there are at most 256 positions calculated, that is, the 0 th to 256 th positions of a certain area in Flash planned here.
An inevitable problem in the hash table lookup algorithm is that hash value "collision" occurs, that is, no matter how randomly the hash function is uniform, it cannot be guaranteed that different key values cannot be generated to calculate the same hash value. This means that both data are stored at the same memory address, which is clearly not feasible. How to resolve hash value "collisions" is also an important point in hash table lookup algorithms.
In one example, the Flash model used is M45PE80, with 4096 pages each of 256 bytes; the minimum unit of erase is a page. For the storage of the list, as shown in fig. 1, the storage is divided into two areas, namely an index table and a list table, and the 0 th to 256 th positions of the index table are hash values calculated by the hash function. But the real list storage address is not on the index table but on the list table. The position corresponding to the hash value in the index table is actually a zipper head, and the numerical value, namely the page number, of the page in the list table really stored in the list table is stored. Thus, if several lists have the same hash value, i.e., the same "zip" header in the index table, they are all stored in the same page of the list table. This solves the problem of hash value "collisions".
When a certain page in the list table is full of the list, and the next list is also used for calculating the same hash value, namely the same hash value is stored in the same page, the other page is found for storage, and meanwhile, the link is established through the index head. The length of the zipper can be infinitely expanded like in the zipper method. Each node in Flash is a page, and each page stores a plurality of data.
(2) Search process
The search process is shown in fig. 2 and comprises the steps of:
a. and receiving a list searching request, wherein the list searching request comprises a key value card number.
b. And calculating a hash value, namely the position in the index table, according to the card number and the hash function.
c. And taking the corresponding Flash storage page number to the corresponding position in the index table.
d. Traversing the list in the Flash storage page, finding out a corresponding list and returning a query result; if the page can not be found, the next page linked with the page is searched for a corresponding list and the query result is returned.
e. If no next page is found yet, it indicates that there is no corresponding list.
Analyzing the searching process: in this example, if there are 5000 lists, and they are evenly distributed; the index table has 256 positions, and each position needs to store about 20 lists on average; each page can be approximately 12 sub-lists, that is, each location is approximately linked to 2 pages; in such an assumption, each searching process needs to spend time reading two pages of data in Flash, and a corresponding list is searched in a traversal mode; if no algorithm is used, the 5000 lists are stored for about 417 pages if they are stored directly in Flash in order. Assuming that the list to be searched is in the middle position, it takes about 200 pages of read Flash to traverse during searching. Obviously, the 'Hash' algorithm is adopted to store and search, so that the time is greatly saved. However, this assumes a uniform valence distribution, which is not possible in practice. However, even if the number of linked pages is not uniform, the number of pages for reading Flash is not large, and the reading time is not too long. Of course, there is an extreme case when the hash values of all lists are the same, and thus the time complexity of the algorithm is degraded to be the same as the complete traversal. However, this is hardly true in practical applications.
(3) Adding process
The list adding flow chart is shown in fig. 3, the list is actually a large blank area, and the links of the list are not in sequence. When adding a list, the process of adding is as follows:
a. and calculating the hash value, namely the position in the index table, according to the card number of the newly added list.
b. And taking the corresponding page number to the corresponding position in the index table.
c. And storing the list in the corresponding page number.
d. If the page is full and the next page of links is not full, then the page is saved to the next page.
e. If the next page does not exist, namely all the linked pages are full, a blank page needs to be searched in the list table at this time, then an index head is added, a link is established, and the blank page is stored in the page.
f. If the data corresponding to a plurality of card numbers are newly stored, the process is circulated in the steps 3-5.
(4) Deletion process
The process of deleting the list is actually similar to the process of searching, and the list is deleted only when the list is searched, and the method comprises the following steps:
a. receiving a list deleting request, wherein the list deleting request comprises a card number;
b. calculating a hash value of the data to be deleted according to the card number and a hash function;
c. and acquiring the index position of the data to be deleted in the index table according to the hash value, and inquiring and deleting the data to be deleted on the Flash storage page corresponding to the index position.
Note that: if the page is empty after deletion, completely clearing the index head of the page and the like, and adjusting the links of other pages linked with the page; when deleting, at least one page is erased, so that the backup of the list needs to be paid attention to, and the power failure is prevented when the page is erased in the deleting process, so that the list which is not originally intended to be deleted is lost.
The embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the technical solution of any one of the above-mentioned embodiments of the image fusion method. The implementation principle and technical effect are similar to those of the above method, and are not described herein again.
It must be noted that in any of the above embodiments, the methods are not necessarily executed in order of sequence number, and as long as it cannot be assumed from the execution logic that they are necessarily executed in a certain order, it means that they can be executed in any other possible order.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (8)

1. A Flash retrieval method based on Hash search is characterized by comprising the following steps:
receiving data to be stored, wherein each piece of data has a unique key value, and calculating the hash value of each piece of data according to a pre-constructed hash function and the key value of each piece of data;
creating an index table in Flash, wherein the hash value of each piece of data represents the index position of the data in the index table, and establishing the mapping relation between the index position in the index table and a Flash storage page;
and determining a Flash storage page of each piece of data according to the hash value of each piece of data and the mapping relation, and storing the data into the corresponding Flash storage page.
2. The Flash retrieval method based on hash lookup as claimed in claim 1, wherein if the hash values calculated for multiple copies of data are the same, the index positions of the multiple copies of data in the index table are the same, and the multiple copies of data are sequentially stored in the Flash storage page corresponding to the same index position.
3. The Flash retrieval method based on hash lookup as claimed in claim 2, wherein the multiple data sets are sequentially stored in the Flash storage pages corresponding to the same index position, when the latter data set is stored, if the Flash storage page corresponding to the index position has no free storage space, a blank Flash storage page is searched, the latter data set is stored in the blank Flash storage page, and a link relationship between the Flash storage page of the latter data set and the Flash storage page corresponding to the index position is established.
4. The Flash retrieval method based on hash lookup according to claim 3, comprising:
receiving a data query request, wherein the data query request comprises a key value of data to be queried;
calculating the hash value of the data to be inquired according to the key value of the data to be inquired and the hash function;
and acquiring an index position to be inquired in the index table according to the hash value of the data to be inquired, searching the data to be inquired on a Flash storage page corresponding to the index position and returning an inquiry result.
5. The Flash retrieval method based on hash lookup as claimed in claim 4, wherein if the data to be queried is not found in the Flash storage page corresponding to the index position of the data to be queried and the Flash storage page corresponding to the index position of the data to be queried is linked to another Flash storage page, continuing to lookup the data to be queried in the other Flash storage page and returning the query result.
6. The Flash retrieval method based on hash lookup according to claim 3, comprising:
receiving a data deleting request, wherein the data deleting request comprises a key value of data to be deleted;
calculating the hash value of the data to be deleted according to the key value of the data to be deleted and the hash function;
and acquiring the index position of the data to be deleted in the index table according to the hash value of the data to be deleted, and inquiring and deleting the data to be deleted on a Flash storage page corresponding to the index position.
7. The Flash retrieval method based on hash lookup as claimed in claim 6, wherein if the Flash storage page where the data to be deleted is located is empty after the data to be deleted is deleted, the link relationship of other Flash storage pages linked to the Flash storage page where the data to be deleted is located is adjusted.
8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
CN202010400678.4A 2020-05-13 2020-05-13 Flash retrieval method based on Hash search and storage medium Pending CN111651372A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010400678.4A CN111651372A (en) 2020-05-13 2020-05-13 Flash retrieval method based on Hash search and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010400678.4A CN111651372A (en) 2020-05-13 2020-05-13 Flash retrieval method based on Hash search and storage medium

Publications (1)

Publication Number Publication Date
CN111651372A true CN111651372A (en) 2020-09-11

Family

ID=72346640

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010400678.4A Pending CN111651372A (en) 2020-05-13 2020-05-13 Flash retrieval method based on Hash search and storage medium

Country Status (1)

Country Link
CN (1) CN111651372A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113096284A (en) * 2021-03-19 2021-07-09 福建新大陆通信科技股份有限公司 CTID access control authorization information verification method
CN114780461A (en) * 2022-06-17 2022-07-22 山东理工职业学院 Storage method and device of singlechip parameters and electronic equipment
CN115878612A (en) * 2022-11-17 2023-03-31 石家庄纵宇科技有限公司 Database structure and retrieval method thereof

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060277178A1 (en) * 2005-06-02 2006-12-07 Wang Ting Z Table look-up method with adaptive hashing
CN101655820A (en) * 2009-08-28 2010-02-24 深圳市茁壮网络股份有限公司 Key word storing method and storing device
CN102682116A (en) * 2012-05-14 2012-09-19 中兴通讯股份有限公司 Method and device for processing table items based on Hash table
CN103020262A (en) * 2012-12-24 2013-04-03 Tcl集团股份有限公司 Data storage method, system and data storage equipment
CN106202548A (en) * 2016-07-25 2016-12-07 网易(杭州)网络有限公司 Date storage method, lookup method and device
CN110032528A (en) * 2019-04-19 2019-07-19 苏州浪潮智能科技有限公司 Internal storage data lookup method, device, equipment and the storage medium of storage system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060277178A1 (en) * 2005-06-02 2006-12-07 Wang Ting Z Table look-up method with adaptive hashing
CN101655820A (en) * 2009-08-28 2010-02-24 深圳市茁壮网络股份有限公司 Key word storing method and storing device
CN102682116A (en) * 2012-05-14 2012-09-19 中兴通讯股份有限公司 Method and device for processing table items based on Hash table
CN103020262A (en) * 2012-12-24 2013-04-03 Tcl集团股份有限公司 Data storage method, system and data storage equipment
CN106202548A (en) * 2016-07-25 2016-12-07 网易(杭州)网络有限公司 Date storage method, lookup method and device
CN110032528A (en) * 2019-04-19 2019-07-19 苏州浪潮智能科技有限公司 Internal storage data lookup method, device, equipment and the storage medium of storage system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张黎明: "《计算机软件技术基础》", 北京工业大学出版社, pages: 300 - 301 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113096284A (en) * 2021-03-19 2021-07-09 福建新大陆通信科技股份有限公司 CTID access control authorization information verification method
CN114780461A (en) * 2022-06-17 2022-07-22 山东理工职业学院 Storage method and device of singlechip parameters and electronic equipment
CN115878612A (en) * 2022-11-17 2023-03-31 石家庄纵宇科技有限公司 Database structure and retrieval method thereof
CN115878612B (en) * 2022-11-17 2023-12-15 北京东方京融教育科技股份有限公司 Database structure and retrieval method thereof

Similar Documents

Publication Publication Date Title
CN111651372A (en) Flash retrieval method based on Hash search and storage medium
CN106294190B (en) Storage space management method and device
CN105975399B (en) Method for managing a memory device and related memory device
CN110321325B (en) File index node searching method, terminal, server, system and storage medium
CN111190904B (en) Method and device for hybrid storage of graph-relational database
CN112287182A (en) Graph data storage and processing method and device and computer storage medium
CN105117355A (en) Memory, memory system and data process method
CN110555001B (en) Data processing method, device, terminal and medium
US20100228914A1 (en) Data caching system and method for implementing large capacity cache
US20200334292A1 (en) Key value append
CN112860592B (en) Data caching method and device based on linked list, electronic equipment and storage medium
CN109766318B (en) File reading method and device
CN111552692A (en) Plus-minus cuckoo filter
CN102739622A (en) Expandable data storage system
WO2020215580A1 (en) Distributed global data deduplication method and device
CN114691721A (en) Graph data query method and device, electronic equipment and storage medium
CN116578746A (en) Object de-duplication method and device
CN114610708A (en) Vector data processing method and device, electronic equipment and storage medium
CN114116612B (en) Access method for index archive file based on B+ tree
CN113157600A (en) Space allocation method of shingled hard disk, file storage system and server
US20200019539A1 (en) Efficient and light-weight indexing for massive blob/objects
CN108241758B (en) Data query method and related equipment
KR20190123819A (en) Method for managing of memory address mapping table for data storage device
CN114553885A (en) DHT network-based storage method and device, electronic equipment and storage medium
CN114416741A (en) KV data writing and reading method and device based on multi-level index and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200911

RJ01 Rejection of invention patent application after publication