CN112395213B - ACEH index structure and method based on memory hot spot data - Google Patents

ACEH index structure and method based on memory hot spot data Download PDF

Info

Publication number
CN112395213B
CN112395213B CN202011296272.2A CN202011296272A CN112395213B CN 112395213 B CN112395213 B CN 112395213B CN 202011296272 A CN202011296272 A CN 202011296272A CN 112395213 B CN112395213 B CN 112395213B
Authority
CN
China
Prior art keywords
data
key
bucket
aceh
hash
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011296272.2A
Other languages
Chinese (zh)
Other versions
CN112395213A (en
Inventor
何水兵
朱彤
曾令仿
段雪豪
银燕龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Zhejiang Lab
Original Assignee
Zhejiang University ZJU
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU, Zhejiang Lab filed Critical Zhejiang University ZJU
Priority to CN202011296272.2A priority Critical patent/CN112395213B/en
Publication of CN112395213A publication Critical patent/CN112395213A/en
Application granted granted Critical
Publication of CN112395213B publication Critical patent/CN112395213B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • G06F12/0238Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory
    • G06F12/0246Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory in block erasable memory, e.g. flash memory
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an ACEH index structure and a method based on memory hot spot oriented data, wherein the structure comprises the following steps: directory entries, segments, and data buckets; squareThe method comprises the following steps: segment indexing is performed on directory entries through global depth G, one segment corresponds to a group of data buckets, segment indexing is performed on data buckets through local depth L, l=g-log 2 k K represents the number of pointers to the data bucket, the data bucket index locates the data bucket inserted by the hash key using an Adjusted-Cuckoo algorithm, the Adjusted-Cuckoo algorithm includes two hash functions, two insertable data buckets are generated, then the empty data bucket is selected for insertion, the Adjusted-Cuckoo algorithm determines one data bucket, and the second data bucket is directly determined as the next data bucket of the current data bucket, the method of operation includes the steps of: step one, inserting operation; step two, refreshing operation; step three, splitting operation; and step four, deleting operation.

Description

ACEH index structure and method based on memory hot spot data
Technical Field
The invention relates to a memory storage structure of a computer, in particular to an ACEH index structure and method based on memory hot spot oriented data.
Background
Key-value storage techniques are widely used by internet companies in actual production environments to improve the performance of data storage. Scholars have studied the hot spot problem in different scenarios and have in some scenarios presented an effective solution. However, hot spot problems in key-value store scenarios are ignored.
In a conventional extensible hash structure, when there is a key-value pair (key-value) inserted, the key is first matched to a directory entry (directory). For example, a key matches the first directory entry "00" of a directory, and is inserted into the Bucket (socket) to which the first directory entry points. Upon entering the data bucket, the conventional scalable hash directly uses a sequential traversal approach until the first empty slot point (slot) is found, inserting the key-value pair. During searching, a directory entry is found according to the corresponding position of the key, a data bucket is positioned through a pointer of the directory entry, and finally a corresponding key value pair is searched in a sequential traversing mode. The deletion process is the same as the lookup. When refreshing, the conventional scalable hash simply performs an insert operation again, which wastes a great deal of space.
When expanded, directory entries multiply. When the directory is modified, the pointer for each directory entry is modified and, in addition, some key pairs in the data bucket are moved accordingly. For example, when the number of directories is 4, the key pair is inserted into a data bucket with a directory entry of 00, and after expansion, the key pair moves into the data bucket corresponding to the directory entry 001 (because the first three bits of the key pair are 001).
Disclosure of Invention
In order to solve the defects in the prior art and achieve the purposes of increasing the utilization rate of the memory and accelerating the searching performance, the invention adopts the following technical scheme:
an ACEH index structure based on memory hotspot-oriented data, comprising: the catalog item and the data barrel adopt an intermediate structure, namely a section, the catalog item and the section are used for solving the problem of insertion search in a three-layer structure by using the global depth G as a section index, one section comprises a group of data barrels, the section and the data barrel are used for using the local depth L as a data barrel index, and L=G-log 2 k K denotes the number of pointers to the data bucket, L is used because there is not necessarily one Directory entry (Directory) corresponding to one segment after the scalable hash expansion, since the number of Directory entries after expansion is greater than the number of original segments, a new segment is created only when a key pair needs to be moved to an absent data bucket, the data bucket index locates the hash key inserted data bucket using the Adjusted-Cuckoo algorithm, the Adjusted-Cuckoo algorithm contains two hash functions, two insertable data buckets are generated, then the empty data bucket is selected for insertion, the Adjusted-Cuckoo algorithm determines one data bucket, the second data bucket is then directly determined as the next data bucket of the current data bucket, the arrangement is friendly to cacheline, compared with the traditional Cuckoo hash algorithm, the space office can be utilizedThe method has the advantages of being partial, accelerating the searching performance and remarkably improving the utilization rate of the memory. This structure has a significant performance improvement for operation on NVM.
Memory-oriented hot spot data-oriented ACEH indexing method, wherein a catalog item is subjected to segment indexing through global depth G, one segment corresponds to a group of data buckets, the segment is subjected to data bucket indexing through local depth L, and L=G-log 2 k K represents the number of pointers to the data bucket, L is adopted because there is not necessarily one Directory entry (Directory) corresponding to one segment after the scalable hash is expanded, since the number of Directory entries after expansion is greater than the number of segments in the original, a new segment is created only when a key value pair needs to be moved to a non-existing data bucket, the data bucket index locates the data bucket inserted by the hash key by adopting the Adjusted-Cuckoo algorithm, the Adjusted-Cuckoo algorithm comprises two hash functions, two pluggable data buckets are generated, then the spare data bucket is selected for insertion, the Adjusted-Cuckoo algorithm determines one data bucket, the second data bucket is directly determined as the next data bucket of the current data bucket, so that the setting is friendly to cacheline, compared with the traditional Cuckoo hash algorithm, space locality can be utilized, search performance is accelerated, significant improvement is also achieved on memory utilization, NVM (non-volatile memory) performance is improved, and the operation method comprises the following steps:
step one, inserting operation;
step two, refreshing operation;
step three, splitting operation;
and step four, deleting operation.
Further, the Adjusted-Cuckoo algorithm selects the multiple-shift function to determine a bucket.
Further, the inserting operation in the first step finds the section pointed by the directory entry by locating the directory entry at the highest position, then finds two insertable data barrels according to the Adjusted-Cuckoo algorithm, sequentially traverses the data in the two data barrels, and updates the value corresponding to the current key if the key is the same as the inserted key; if the key is not the same as the inserted key, the hash key is inserted randomly if both data barrels have free positions, if only one data barrel is free, the free data barrel is inserted, and if both data barrels have no free positions, the insertion fails, and the splitting operation is performed.
Furthermore, the refreshing operation in the second step adopts in-situ refreshing, the key to be updated is directly compared with the key stored in each slot in the detection process, and if the key is the same, the key is updated. The traditional extensible hash is directly inserted into data, so that repeated key value pairs exist, space is wasted, errors of read data are likely to occur, in-situ refreshing is performed, the repeated key value pairs are removed, and the memory utilization rate is improved.
Further, in the splitting operation in the third step, a new segment is created, and an extra directory entry pointer pointing to the original segment is turned to point to the new segment, and the valid key value pair corresponding to the directory entry pointer is also transferred to the new segment. And deleting the key value pairs in the original segment.
Further, the deletion operation in the fourth step adopts inert deletion, after updating the directory entry, the query searching for the migrated record will access the new segment, and the query searching for the non-migrated record will access the old segment, but because the split segment (i.e. the old segment) contains all keys, they will always successfully find the record searching for the key value, which contains some unnecessary repeated items, while inert expansion is performed during expansion, and while inert expansion is performed during expansion, when some key value pairs are migrated, the key value pairs of the migrated original data bucket are not deleted immediately, but when new data is inserted into the original data bucket, the key value pairs that have been migrated are replaced directly, thus reducing the overhead when the hash table is expanded, and improving the utilization rate of the memory.
Further, in the inert deletion, when deleting the data x1 in the Bucket0 data Bucket, the data x2 in the Bucket1 data Bucket is directly covered with the data x1, the data at the x2 position is marked as invalid, and the data inserted into the Bucket0 and the data inserted into the Bucket1 can be strived for in the Bucket0 as much as possible, so that the searching times are reduced, the average length of the searched data is reduced, and the data access performance is improved.
The invention has the advantages that:
the hash index for the hot data set is improved based on the extensible hash structure ACEH, and the method is different from the general extensible hash index, in the secondary indexing process, the ACEH uses a modified cuckoo hash algorithm, so that the insertable position of each data is increased, and the memory utilization rate is increased; ACEH also provides in-situ refresh operations, reducing memory space occupation by duplicate key values. At the same time, the operation can also reduce the splitting operation of the ACEH structure and improve the insertion performance.
Drawings
Fig. 1 is a schematic diagram of an ACEH index structure of the present invention.
FIG. 2 is a schematic diagram of the process of the Adjusted-Cuckoo algorithm of the present invention.
FIG. 3 is a schematic diagram of the creation of a new segment in the present invention.
FIG. 4 is a diagram of directory entry expansion in accordance with the present invention.
Detailed Description
The following describes specific embodiments of the present invention in detail with reference to the drawings. It should be understood that the detailed description and specific examples, while indicating and illustrating the invention, are not intended to limit the invention.
The ACEH (Adjusted-Cuckoo Extendible Hashing) structure for hotspot data is a hash storage structure for data sets containing hotspots.
1. ACEH logic structure and algorithm:
as shown in FIG. 1, ACEH employs an intermediate structure, referred to as Segment, in structure over conventional extensible Ha Xiduo. One Segment consists of N pockets. To solve the insert lookup problem in a three-layer structure, the structure uses the G bit (representing global depth) as the segment index and uses the Adjusted-Cuckoo algorithm to locate which socket the hash key is inserted into.
Adjust-Cuckoo algorithm: as with the traditional Cuckoo hash algorithm, the Adjusted-Cuckoo algorithm also comprises two hash functions, so that two insertable pockets are generated after the hash key passes through the Adjusted-Cuckoo algorithm, then the rest pockets are selected for insertion, but unlike the Cuckoo hash algorithm which randomly selects the hash functions, the Adjusted-Cuckoo algorithm selects the multiple-shift function to determine one pocket, and the second pocket is directly determined as the next pocket of the current pocket, so that the setting is friendly to cacheline, and compared with the traditional Cuckoo hash algorithm, the spatial locality can be utilized, and the search performance is quickened.
As shown in FIG. 2, assuming that x1 is calculated by the Adjusted-Cuckoo algorithm and can be inserted into pocket 0 and pocket 1, sequential traversal is followed by inserting pocket 0, and x2 is also calculated by the Adjusted-Cuckoo algorithm and can be inserted into pocket 0 and pocket 1, sequential traversal is followed by inserting pocket 1, when x1 is deleted, x2 can be directly overlaid with x1, and then the data at the x2 position is marked as INVALID (INVALID). Therefore, data inserted into the socket 0 and the data inserted into the socket 1 can be strived for as much as possible, the searching times are reduced, the average length of the searching data is reduced, and the data access performance is improved.
After the Adjust-Cuckoo algorithm, compared with the traditional extensible hash, ACEH is also remarkably improved in memory utilization rate. In a refresh operation, a conventional scalable hash is to insert data directly, so that repeated key-value pairs waste space and errors in the read data are likely to occur. While ACEH employs in-situ refresh, removing duplicate key-value pairs.
2. ACEH operation:
as shown in fig. 3, assume that a given hash key is 00100110..11110, and that the global depth G (Global Depth) is 2, the first two bits of the most significant bits are used as Segment indexes, and the least significant bytes are used as Bucket indexes. L represents the Local Depth (Local Depth), l=g-log 2 k K represents the number of pointers to the data bucket, and the local depth L is used because there is not necessarily one Directory entry (Directory) corresponding to one segment after the scalable hash expansion, and since the number of Directory entries after expansion is greater than the number of original segments, a new segment is created only when the key pair needs to be moved to the non-existing data bucket, as shown in fig. 4, the Directory entry double expansion (Directory Doubling) operation: expansion is performed according to Most Significant Bits, wherein white indicates a directory entry prior to expansion and gray indicates a newly added directory entry after expansion.
Insert operation: when hash key 00100110..11111110 is inserted, firstly, locating to a 00 directory entry through a most significant (Most Significant Bits) index, finding a Segment pointed by the 00 directory entry, then finding two insertable pockets according to an Adjusted-Cuckoo algorithm, sequentially traversing data in the two pockets, and if a key is the same as an inserted key, updating a value corresponding to the current key; if the key is not the same as the inserted key, if two sockets have idle positions, hash keys are randomly inserted, if only one socket is empty, the idle socket is inserted, if the two sockets have no idle positions, the insertion fails, and Split operation is performed.
Update operation: for the refresh operation, because the ACEH adopts a linear detection method during insertion, the key value to be updated can be directly compared with the key stored in each slot in the detection process, and if the key value is the same, the key value is updated.
Split operation: assuming that keys 00100110..11110 are to be inserted, after calculation by the Adjusted-Cuckoo algorithm, the keys are to be inserted into both the pocket 2 and the pocket 254, but both the pockets have no redundant space for storing data, at this time, ACEH creates a new Segment4, and at the same time, an extra pointer to Segment3, i.e. a pointer to the 11 directory entry, is turned to Segment4, and the valid hash key value pair with the highest 11 in the original Segment3 is also transferred to Segment 4.
Delete operation: with lazy deletion, migrated records are not deleted immediately. After updating the directory entries, queries searching for migrated records will access new segments, while queries searching for non-migrated records will access old segments, but since the split segments (i.e., old segments) contain all keys, they will always succeed in finding a record of the search key value, which contains some unnecessary duplication. In the process of the inert expansion, when part of key value pairs are migrated, the key value pairs in the original data barrel are not deleted, and when new data is inserted into the original data barrel, the migrated data is directly replaced, so that the expenditure of the hash table during expansion can be reduced.
For example, insert one record with hash key 1010..11111110, access segment3, assuming the Adjusted-Cuckoo algorithm calculates that to insert pocket 2 and pocket 254, it is found that pocket 254 is full, the hash key of the first record in pocket 2 is 1001..0010 is valid, but the hash key of the second record is 1111..0010 is invalid, then insert the transaction to replace the second record with the new record. Since the validity of each record is determined by the local depth, the order in which the directory entries are updated must be preserved to maintain consistency and fault scope.
The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced with equivalents; such modifications and substitutions do not depart from the spirit of the technical solutions according to the embodiments of the present invention.

Claims (5)

1. An index method of an extensible hash storage structure ACEH based on memory oriented hot spot data is characterized in that a catalog item passes through global depthGIndexing segments, one segment corresponding to each group of data buckets, the segments passing through local depthsLThe indexing of the data buckets is performed,
Figure QLYQS_1
krepresenting the number of pointers to the data bucket, the data bucket index locating the data bucket inserted by the hash key using an Adjusted-Cuckoo algorithm that includes two hash functions, generating two insertable data buckets, then selecting the remaining data buckets to insert, the Adjusted-Cuckoo algorithm determining one data bucket, the second data bucket directly determining the next data bucket to the current data bucket,
when the data 1 can be inserted into the first data barrel and the second data barrel after being calculated by the algorithm, the first data barrel can be inserted after the sequential traversal, the data 2 can be inserted into the first data barrel and the second data barrel after being calculated by the algorithm, the second data barrel can be inserted after the sequential traversal, when the data 1 is deleted, the data 2 is directly covered with the data 1, and then the data at the position of the data 2 is marked as invalid.
2. The method for indexing an ACEH based on a memory hot spot oriented data scalable hash storage structure of claim 1, wherein said Adjusted-Cuckoo algorithm determines a bucket by a multiple-shift function.
3. The method for indexing the memory-oriented hot spot data-oriented extensible hash storage structure ACEH according to claim 1 or 2, wherein the method comprises the steps of performing key value pair insertion operation, locating a directory entry through the highest order, finding a segment pointed by the directory entry, finding two pluggable data barrels according to an Adjusted-Cuckoo algorithm, sequentially traversing data in the two data barrels, and updating a value corresponding to a current key if the key is the same as an inserted key; if the key is not the same as the inserted key, the hash key is randomly inserted if both data barrels have idle positions, if only one data barrel is empty, the idle data barrel is inserted, if both data barrels have no idle positions, the insertion is failed, and the key value splitting operation is performed.
4. The method for indexing the memory-oriented hotspot data scalable hash storage structure ACEH according to claim 1 or 2, wherein the method comprises a key value pair refreshing operation, in-situ refreshing is adopted, a key to be updated is directly compared with a key stored in each slot in a detection process, and if the key is the same, the key is updated.
5. A method for indexing a memory-oriented hotspot data scalable hash storage structure ACEH according to claim 1 or 2, wherein the method comprises a key-value splitting operation, creating a new segment, and simultaneously turning an extra directory entry pointer to the original segment to the new segment, wherein valid key-value pairs corresponding to the directory entry pointer are also transferred to the new segment.
CN202011296272.2A 2020-11-18 2020-11-18 ACEH index structure and method based on memory hot spot data Active CN112395213B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011296272.2A CN112395213B (en) 2020-11-18 2020-11-18 ACEH index structure and method based on memory hot spot data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011296272.2A CN112395213B (en) 2020-11-18 2020-11-18 ACEH index structure and method based on memory hot spot data

Publications (2)

Publication Number Publication Date
CN112395213A CN112395213A (en) 2021-02-23
CN112395213B true CN112395213B (en) 2023-05-30

Family

ID=74606497

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011296272.2A Active CN112395213B (en) 2020-11-18 2020-11-18 ACEH index structure and method based on memory hot spot data

Country Status (1)

Country Link
CN (1) CN112395213B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113342706A (en) * 2021-05-13 2021-09-03 武汉大学 Write-optimized extensible hash index structure based on nonvolatile memory and inserting, refreshing and deleting methods
CN113505130B (en) * 2021-07-09 2023-07-21 中国科学院计算技术研究所 Hash table processing method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110069496A (en) * 2019-03-20 2019-07-30 韶关学院 A kind of Novel chain type Hash table construction method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06231023A (en) * 1993-10-19 1994-08-19 Olympus Optical Co Ltd Information recorder
CN103106144B (en) * 2011-11-15 2015-10-28 北京新媒传信科技有限公司 A kind of internal memory index compression method and apparatus
CN111459846B (en) * 2020-03-12 2022-03-18 华中科技大学 Dynamic hash table operation method based on hybrid DRAM-NVM

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110069496A (en) * 2019-03-20 2019-07-30 韶关学院 A kind of Novel chain type Hash table construction method and device

Also Published As

Publication number Publication date
CN112395213A (en) 2021-02-23

Similar Documents

Publication Publication Date Title
CN110083601B (en) Key value storage system-oriented index tree construction method and system
CN111459846B (en) Dynamic hash table operation method based on hybrid DRAM-NVM
EP0303231B1 (en) Method and device for enabling concurrent access of indexed sequential data files
JP2670383B2 (en) Prefix search tree with partial key branch function
US5542087A (en) Linear hashing for distributed records
US8868926B2 (en) Cryptographic hash database
CN112395213B (en) ACEH index structure and method based on memory hot spot data
US20100146213A1 (en) Data Cache Processing Method, System And Data Cache Apparatus
US6480950B1 (en) Software paging system
Braginsky et al. Locality-conscious lock-free linked lists
KR20080024237A (en) Database
CN111552692B (en) Plus-minus cuckoo filter
US8086641B1 (en) Integrated search engine devices that utilize SPM-linked bit maps to reduce handle memory duplication and methods of operating same
US4507752A (en) In-place index compression
US5659739A (en) Skip list data structure enhancements
US7054994B2 (en) Multiple-RAM CAM device and method therefor
US7464100B2 (en) Reorganization-free mapping of objects in databases using a mapping chain
CN114385636A (en) Persistent memory dynamic hash index method, system, equipment and storage medium
US7953721B1 (en) Integrated search engine devices that support database key dumping and methods of operating same
CN113342706A (en) Write-optimized extensible hash index structure based on nonvolatile memory and inserting, refreshing and deleting methods
Jensen et al. Optimality in external memory hashing
TWI761440B (en) memory access method
US8166043B2 (en) Bit strings search apparatus, search method, and program
CN116701440A (en) Cuckoo filter and data insertion, query and deletion method
CN116821127A (en) Method for realizing hash index of kv stored distributed database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant