CN111651372A

CN111651372A - Flash retrieval method based on Hash search and storage medium

Info

Publication number: CN111651372A
Application number: CN202010400678.4A
Authority: CN
Inventors: 沈坤; 王建国; 聂思静; 梁美红; 陈秀琼; 王敏敏
Original assignee: Hubei Sanjiang Aerospace Wanfeng Technology Development Co Ltd
Current assignee: Hubei Sanjiang Aerospace Wanfeng Technology Development Co Ltd
Priority date: 2020-05-13
Filing date: 2020-05-13
Publication date: 2020-09-11

Abstract

The invention discloses a Flash retrieval method based on Hash search and a storage medium. The method comprises the following steps: receiving data to be stored, wherein each piece of data has a unique key value, and calculating the hash value of each piece of data according to a pre-constructed hash function and the key value of each piece of data; creating an index table in Flash, wherein the hash value of each piece of data represents the index position of the data in the index table, and establishing the mapping relation between the index position in the index table and a Flash storage page; and determining a Flash storage page of each piece of data according to the hash value of each piece of data and the mapping relation, and storing the data into the corresponding Flash storage page. The invention has the advantage of high retrieval efficiency.

Description

Flash retrieval method based on Hash search and storage medium

Technical Field

The invention belongs to the technical field of data retrieval, and particularly relates to a Flash retrieval method based on hash search and a storage medium.

Background

In the design of embedded products, it is usually necessary to store some data such as configuration parameters, operation states, operation results, etc. Moreover, the data needs to be stored for a long time and cannot be lost when power is lost. The efficiency of retrieving and storing data directly affects the performance of the product and even determines the success or failure of the product design.

Common search algorithms include sequential search, binary search, and binary tree search. For Flash retrieval on an embedded single chip microcomputer, the storage and available space are limited, so the method is not suitable for a binary sequencing tree, meanwhile, the retrieval speed is required, and when the data size of storage and retrieval is large, sequential search and binary search are not suitable for time requirements.

Disclosure of Invention

Aiming at least one defect or improvement requirement in the prior art, the invention provides a Flash retrieval method and a storage medium based on Hash search, which can enable more data to be stored in a limited storage space in Flash and can also ensure the high efficiency of data retrieval.

In order to achieve the above object, according to a first aspect of the present invention, there is provided a Flash retrieval method based on hash lookup, including:

receiving data to be stored, wherein each piece of data has a unique key value, and calculating the hash value of each piece of data according to a pre-constructed hash function and the key value of each piece of data;

creating an index table in Flash, wherein the hash value of each piece of data represents the index position of the data in the index table, and establishing the mapping relation between the index position in the index table and a Flash storage page;

and determining a Flash storage page of each piece of data according to the hash value of each piece of data and the mapping relation, and storing the data to the corresponding Flash storage page.

Preferably, if the hash values calculated by the multiple copies of data are the same, the index positions of the multiple copies of data in the index table are the same, and the multiple copies of data are sequentially stored in the Flash storage page corresponding to the index positions.

Preferably, the multiple data sets are stored in the Flash storage page corresponding to the index position in sequence, when the latter data set is stored, if the Flash storage page corresponding to the index position has no spare storage space, a blank Flash storage page is searched, the latter data set is stored in the blank Flash storage page, and a link relationship between the Flash storage page of the latter data set and the Flash storage page corresponding to the index position is established.

Preferably, the Flash retrieval method further includes:

receiving a data query request, wherein the data query request comprises a key value of data to be queried;

calculating the hash value of the data to be inquired according to the key value of the data to be inquired and the hash function;

and acquiring an index position to be inquired in the index table according to the hash value of the data to be inquired, searching the data to be inquired on a Flash storage page corresponding to the index position and returning an inquiry result.

Preferably, if the data to be queried is not found in the Flash storage page corresponding to the index position of the data to be queried and the Flash storage page corresponding to the index position of the data to be queried is linked to another Flash storage page, continuing to search the data to be queried in the other Flash storage page and returning a query result.

Preferably, the Flash retrieval method further includes:

receiving a data deleting request, wherein the data deleting request comprises a key value of data to be deleted;

calculating the hash value of the data to be deleted according to the key value of the data to be deleted and the hash function;

and acquiring the index position of the data to be deleted in the index table according to the hash value of the data to be deleted, and inquiring and deleting the data to be deleted on a Flash storage page corresponding to the index position.

Preferably, if the Flash storage page where the data to be deleted is located is empty after the data to be deleted is deleted, the link relation of other Flash storage pages linked to the Flash storage page where the data to be deleted is located is adjusted.

According to a second aspect of the invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements any of the methods described above.

In general, compared with the prior art, the method adopts a Hash search algorithm, can store more data in a limited storage space in Flash, can ensure the high efficiency of data retrieval, reduces the response time of a system, improves the efficiency of data storage and retrieval, and is very suitable for the special environment of embedded data storage.

Drawings

FIG. 1 is a schematic diagram of a Flash retrieval method based on hash search according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a search flow of a Flash retrieval method based on hash search according to an embodiment of the present invention;

fig. 3 is a schematic view of an added flow of the Flash retrieval method based on hash lookup according to the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The Flash retrieval method based on the Hash search is suitable for embedded systems such as a single chip microcomputer and the like, adopts the basic idea of the Hash search, and specifically comprises the steps of S1-S3.

And S1, receiving data to be stored, wherein each piece of data has a unique key value, and calculating the hash value of each piece of data according to a pre-constructed hash function and the key value of each piece of data.

In one embodiment, the hash function is used to convert the key value into BCD code, then add each byte of the BCD code in sequence to obtain the final result, and then take the value of one byte of the result to obtain the hash value corresponding to the key value, and the hash value is relatively uniformly distributed.

S2, creating an index table in Flash, wherein the hash value of each piece of data represents the index position of the piece of data in the index table, and establishing the mapping relation between the index position in the index table and the Flash storage page.

And S3, determining the Flash storage page of each piece of data according to the hash value of each piece of data and the mapping relation, and storing the data into the corresponding Flash storage page.

When the data is stored, the data is divided into an index table and a storage area, and the 0 th to N th positions of the index table are hash values calculated by the hash function. The 0 th to N th positions in the index table have mapping relation with the Flash storage page. The actual data storage address is not on the index table but on the storage area. The position corresponding to the hash value in the index table is actually a zipper head, and the page number of the Flash storage page where the data is really stored is stored.

And if the hash values calculated by the multiple data copies are the same, the index positions of the multiple data copies in the index table are the same, and the multiple data copies are sequentially stored in the Flash storage page corresponding to the index positions. That is, if the hash values calculated from several data are the same, that is, the same "zipper" head is in the index table, they are stored in the same Flash storage page according to the storage sequence.

When the latter data is stored, if the Flash storage page corresponding to the index position has no spare storage space, searching a blank Flash storage page, storing the latter data in the blank Flash storage page, and establishing a link relation between the Flash storage page of the latter data and the Flash storage page corresponding to the index position. That is, when a certain page in Flash is full of the list, and the next data is also the same hash value calculated, that is, the next data is stored in the same page, another page is found to be stored, and a link is established through the index header.

The Flash retrieval method based on hash search according to the embodiment of the invention is specifically described below by taking a laboratory card punching access control embedded system as an example.

Assuming the requirements of a laboratory card punching access control embedded system: 1) at least 5000 parts of lists are required to be stored in Flash; 2) the list contents comprise card numbers, validity periods, authorities, passwords, user names and other information; 3) the corresponding list needs to be quickly found from Flash according to the card number, and the required time is millisecond.

(1) Concrete method for data storage

The storage method of 'Hash' table is characterized by that firstly, according to a key value uniquely correspondent to a portion of data its storage address, i.e. Hash value, can be calculated by means of pre-constructed Hash function, then the data can be directly taken out from said correspondent storage address. Therefore, the process of traversing search is avoided, and the search time is greatly saved.

The Key value is typically a Key value uniquely associated with a piece of data, such as a card number in this example, which is unique, and a list of card numbers. The card number is suitable as a list of key values. In this example, the key value is a card number, which is a number of 4 bytes or 8 bytes; here collectively considered to be an 8-byte number.

The method comprises the steps of constructing a hash function, wherein for a number with a long length, each digit of the number is random and uniform, and by utilizing the characteristic, the algorithm of the hash function converts the number into a BCD code, then each byte of the BCD code is added in sequence to obtain a final result, and then the value of one byte of the result is taken. The hash values thus obtained are relatively randomly evenly distributed. The hash function is as follows:

the hextobcd function is to convert the 8-byte number into a 10-byte BCD code, then add the 10 bytes in sequence, and finally assign the value to one-byte ucfind, that is, only take the value of one byte. The value is a storage address, and it can be seen that there are at most 256 positions calculated, that is, the 0 th to 256 th positions of a certain area in Flash planned here.

An inevitable problem in the hash table lookup algorithm is that hash value "collision" occurs, that is, no matter how randomly the hash function is uniform, it cannot be guaranteed that different key values cannot be generated to calculate the same hash value. This means that both data are stored at the same memory address, which is clearly not feasible. How to resolve hash value "collisions" is also an important point in hash table lookup algorithms.

In one example, the Flash model used is M45PE80, with 4096 pages each of 256 bytes; the minimum unit of erase is a page. For the storage of the list, as shown in fig. 1, the storage is divided into two areas, namely an index table and a list table, and the 0 th to 256 th positions of the index table are hash values calculated by the hash function. But the real list storage address is not on the index table but on the list table. The position corresponding to the hash value in the index table is actually a zipper head, and the numerical value, namely the page number, of the page in the list table really stored in the list table is stored. Thus, if several lists have the same hash value, i.e., the same "zip" header in the index table, they are all stored in the same page of the list table. This solves the problem of hash value "collisions".

When a certain page in the list table is full of the list, and the next list is also used for calculating the same hash value, namely the same hash value is stored in the same page, the other page is found for storage, and meanwhile, the link is established through the index head. The length of the zipper can be infinitely expanded like in the zipper method. Each node in Flash is a page, and each page stores a plurality of data.

(2) Search process

The search process is shown in fig. 2 and comprises the steps of:

a. and receiving a list searching request, wherein the list searching request comprises a key value card number.

b. And calculating a hash value, namely the position in the index table, according to the card number and the hash function.

c. And taking the corresponding Flash storage page number to the corresponding position in the index table.

d. Traversing the list in the Flash storage page, finding out a corresponding list and returning a query result; if the page can not be found, the next page linked with the page is searched for a corresponding list and the query result is returned.

e. If no next page is found yet, it indicates that there is no corresponding list.

Analyzing the searching process: in this example, if there are 5000 lists, and they are evenly distributed; the index table has 256 positions, and each position needs to store about 20 lists on average; each page can be approximately 12 sub-lists, that is, each location is approximately linked to 2 pages; in such an assumption, each searching process needs to spend time reading two pages of data in Flash, and a corresponding list is searched in a traversal mode; if no algorithm is used, the 5000 lists are stored for about 417 pages if they are stored directly in Flash in order. Assuming that the list to be searched is in the middle position, it takes about 200 pages of read Flash to traverse during searching. Obviously, the 'Hash' algorithm is adopted to store and search, so that the time is greatly saved. However, this assumes a uniform valence distribution, which is not possible in practice. However, even if the number of linked pages is not uniform, the number of pages for reading Flash is not large, and the reading time is not too long. Of course, there is an extreme case when the hash values of all lists are the same, and thus the time complexity of the algorithm is degraded to be the same as the complete traversal. However, this is hardly true in practical applications.

(3) Adding process

The list adding flow chart is shown in fig. 3, the list is actually a large blank area, and the links of the list are not in sequence. When adding a list, the process of adding is as follows:

a. and calculating the hash value, namely the position in the index table, according to the card number of the newly added list.

b. And taking the corresponding page number to the corresponding position in the index table.

c. And storing the list in the corresponding page number.

d. If the page is full and the next page of links is not full, then the page is saved to the next page.

e. If the next page does not exist, namely all the linked pages are full, a blank page needs to be searched in the list table at this time, then an index head is added, a link is established, and the blank page is stored in the page.

f. If the data corresponding to a plurality of card numbers are newly stored, the process is circulated in the steps 3-5.

(4) Deletion process

The process of deleting the list is actually similar to the process of searching, and the list is deleted only when the list is searched, and the method comprises the following steps:

a. receiving a list deleting request, wherein the list deleting request comprises a card number;

b. calculating a hash value of the data to be deleted according to the card number and a hash function;

c. and acquiring the index position of the data to be deleted in the index table according to the hash value, and inquiring and deleting the data to be deleted on the Flash storage page corresponding to the index position.

Note that: if the page is empty after deletion, completely clearing the index head of the page and the like, and adjusting the links of other pages linked with the page; when deleting, at least one page is erased, so that the backup of the list needs to be paid attention to, and the power failure is prevented when the page is erased in the deleting process, so that the list which is not originally intended to be deleted is lost.

The embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the technical solution of any one of the above-mentioned embodiments of the image fusion method. The implementation principle and technical effect are similar to those of the above method, and are not described herein again.

It must be noted that in any of the above embodiments, the methods are not necessarily executed in order of sequence number, and as long as it cannot be assumed from the execution logic that they are necessarily executed in a certain order, it means that they can be executed in any other possible order.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A Flash retrieval method based on Hash search is characterized by comprising the following steps:

and determining a Flash storage page of each piece of data according to the hash value of each piece of data and the mapping relation, and storing the data into the corresponding Flash storage page.

2. The Flash retrieval method based on hash lookup as claimed in claim 1, wherein if the hash values calculated for multiple copies of data are the same, the index positions of the multiple copies of data in the index table are the same, and the multiple copies of data are sequentially stored in the Flash storage page corresponding to the same index position.

3. The Flash retrieval method based on hash lookup as claimed in claim 2, wherein the multiple data sets are sequentially stored in the Flash storage pages corresponding to the same index position, when the latter data set is stored, if the Flash storage page corresponding to the index position has no free storage space, a blank Flash storage page is searched, the latter data set is stored in the blank Flash storage page, and a link relationship between the Flash storage page of the latter data set and the Flash storage page corresponding to the index position is established.

4. The Flash retrieval method based on hash lookup according to claim 3, comprising:

5. The Flash retrieval method based on hash lookup as claimed in claim 4, wherein if the data to be queried is not found in the Flash storage page corresponding to the index position of the data to be queried and the Flash storage page corresponding to the index position of the data to be queried is linked to another Flash storage page, continuing to lookup the data to be queried in the other Flash storage page and returning the query result.

6. The Flash retrieval method based on hash lookup according to claim 3, comprising:

7. The Flash retrieval method based on hash lookup as claimed in claim 6, wherein if the Flash storage page where the data to be deleted is located is empty after the data to be deleted is deleted, the link relationship of other Flash storage pages linked to the Flash storage page where the data to be deleted is located is adjusted.

8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.