CN111552693A - Tag cuckoo filter - Google Patents
Tag cuckoo filter Download PDFInfo
- Publication number
- CN111552693A CN111552693A CN202010360757.7A CN202010360757A CN111552693A CN 111552693 A CN111552693 A CN 111552693A CN 202010360757 A CN202010360757 A CN 202010360757A CN 111552693 A CN111552693 A CN 111552693A
- Authority
- CN
- China
- Prior art keywords
- candidate
- fingerprint
- bucket
- tag
- label
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2255—Hash tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
- G06F16/2443—Stored procedures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Collating Specific Patterns (AREA)
Abstract
The invention discloses a tag cuckoo filter, which comprises a cuckoo hash table, wherein the cuckoo hash table comprises a plurality of storage barrels, each data member corresponds to two tag fingerprints, and the two tag fingerprints are respectively stored in the two storage barrels; when the tag cuckoo filter receives a data member management operation, determining two candidate buckets and two tag fingerprints corresponding to preset data members based on an exclusive-or operation, and executing the data member management operation based on the two candidate buckets and the two tag fingerprints obtained through determination. The invention respectively configures two label fingerprints and two storage buckets for each data member, determines the candidate storage buckets corresponding to the data members by adopting XOR operation based on the label fingerprints, and does not require that the number of the storage buckets is required to be the power of 2, thereby reducing the storage space overhead of each data member.
Description
Technical Field
The invention relates to the technical field of computer information representation and information retrieval, in particular to a tag cuckoo filter.
Background
Member Membership Query (Membership Query) is one of the key methods for many network applications and distributed systems (e.g., cooperative caching, packet processing, key value storage and deduplication), and is required to satisfy the requirements of low storage space overhead,Three key requirements, namely quick query and incremental update. Currently, member membership query generally adopts a Bloom Filter (Bloom Filter), a Standard Bloom Filter (Standard Bloom Filter), a counting Bloom Filter (counting Bloom Filter), a Cuckoo Filter (Cuckoo Filter), and the like, but the Bloom Filter (Bloom Filter) and its variants are difficult to simultaneously satisfy the above three key requirements. For example, a standard bloom filter supports element insertion and query operations, but does not support element deletion operations. Counting bloom filters are one type of bloom filter that supports delete operations, but their storage space overhead is high. The brook bird filter is a space-efficient broomm filter supporting deletion operations, and the storage space overhead of counting the broomm filter is obviously reduced and even lower than that of a standard broomm filter. However, the existing cuckoo filter has the problem that the storage space overhead of each data member dynamically changes along with the number of elements, because the exclusive or operation of the cuckoo filter requires that the number of storage buckets is required to be a power of 2 (namely 2)bB is exponential) resulting in a 2-fold increase in worst-case memory overhead per data member.
Thus, the prior art has yet to be improved and enhanced.
Disclosure of Invention
The invention aims to solve the technical problem of providing a tag cuckoo filter aiming at the defects of the prior art.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
a tag cuckoo filter comprises a cuckoo hash table, wherein the cuckoo hash table comprises a plurality of storage buckets, each data member corresponds to two tag fingerprints, and the two tag fingerprints are stored in the two storage buckets respectively; when the tag cuckoo filter receives a data member management operation, determining two candidate buckets and two tag fingerprints corresponding to a preset data member based on an exclusive-or operation, and executing the data member management operation based on the two candidate buckets and the two tag fingerprints, wherein the preset data member is a data member corresponding to the data member management operation.
The tag cuckoo filter is characterized in that the tag fingerprints comprise tags and tag fingerprint remainders, and the tag fingerprint remainders of the two tag fingerprints corresponding to the preset data members are the same.
The tag cuckoo filter, wherein, the determining of two candidate buckets and two tag fingerprints corresponding to preset data members based on an exclusive or operation specifically includes:
determining a label fingerprint remainder corresponding to the preset data member and a first candidate bucket;
determining a second candidate bucket corresponding to the preset data member by adopting an exclusive-or operation based on the tag fingerprint remainder and the first candidate bucket;
determining a first label corresponding to a preset data member according to the first candidate bucket, and determining a second label corresponding to the preset data member according to the second candidate bucket;
determining a first label fingerprint corresponding to a preset data member according to the first label and the label fingerprint remainder, and determining a second label fingerprint corresponding to the preset data member according to the second label and the label fingerprint remainder;
and respectively correcting the first candidate bucket and the second candidate bucket according to the number of the storage buckets, and taking the corrected first candidate bucket and the corrected second candidate bucket as a first candidate bucket and a second storage candidate bucket corresponding to a preset data member.
The tag cuckoo filter, wherein, when the data member management operation is an insert operation, the performing the data member management operation based on the two candidate buckets and the two tag fingerprints specifically includes:
detecting whether there are free storage locations in the first candidate bucket and the second candidate bucket;
and if the idle storage position exists, storing the candidate tag fingerprint of the preset data member in the idle storage position, wherein the candidate tag fingerprint is a fingerprint tag corresponding to the candidate storage bucket to which the idle storage position belongs.
The tag cuckoo filter, wherein, when the data membership management operation is an insert operation, the performing the data membership management operation based on the two candidate buckets and the two tag fingerprints includes:
if no idle storage position exists, selecting a target candidate storage bucket from the first candidate storage bucket and the second candidate storage bucket, and taking the label fingerprint corresponding to the target candidate storage bucket as the label fingerprint corresponding to a preset data member;
selecting a target label fingerprint in the target candidate storage bucket, and storing the label fingerprint corresponding to the preset data member in a storage position corresponding to the target label fingerprint;
determining a reference storage bucket and a reference label fingerprint corresponding to the target label fingerprint by adopting an exclusive OR operation according to the target candidate storage bucket and the target label fingerprint;
if the reference bucket has a free storage location, the reference tag fingerprint is stored in the free storage location.
The tag cuckoo filter, wherein, when the data membership management operation is an insert operation, the performing the data membership management operation based on the two candidate buckets and the two tag fingerprints includes:
if the reference storage bucket does not have an idle storage position, selecting a target storage position in the reference storage bucket, and storing the reference label fingerprint in the target storage position;
taking the label fingerprint corresponding to the target storage position as a target label fingerprint, and taking the reference storage bucket as a target candidate storage bucket;
and continuing to execute the step of determining the reference storage bucket and the reference label fingerprint corresponding to the target label fingerprint by adopting an exclusive OR operation according to the target candidate storage bucket and the target label fingerprint until the reference storage bucket has an idle storage position or the execution times reaches a preset time threshold.
The tag cuckoo filter, wherein the determining, according to the target candidate bucket and the target tag fingerprint, a reference bucket and a reference tag fingerprint corresponding to the target tag fingerprint by using an exclusive or operation specifically includes:
obtaining a target tag of the target tag fingerprint and updating the target candidate bucket based on the target tag;
determining a comparison storage bucket corresponding to the target label fingerprint by adopting exclusive OR operation based on the updated target candidate storage bucket and the label fingerprint remainder of the target label fingerprint;
and determining a reference label fingerprint corresponding to the target label fingerprint based on the comparison storage bucket and the label fingerprint remainder of the target label fingerprint, and correcting the comparison storage bucket based on the number of the storage buckets to obtain the reference storage bucket corresponding to the target label fingerprint.
The tag cuckoo filter, wherein, when the data member management operation is an inquiry operation or a deletion operation, the executing the data member management operation based on the two candidate buckets and the two tag fingerprints specifically includes:
searching a first label fingerprint and a second label fingerprint corresponding to the preset data member in the first candidate bucket and the second candidate bucket respectively;
and if the first label fingerprint or the second label fingerprint corresponding to the preset data member is found, executing the data member management operation on the fingerprint.
The tag cuckoo filter, wherein the performing the data member management operation on the fingerprint specifically includes:
when the data member management operation is a query operation, prompting the data member to successfully query;
and when the data member management operation is a deletion operation, deleting the label fingerprint corresponding to the found preset data member, wherein the label fingerprint is a first label fingerprint or a second label fingerprint.
The tag cuckoo filter, wherein, when the data member management operation is an inquiry operation or a deletion operation, the executing the data member management operation based on the two candidate buckets and the two tag fingerprints specifically includes:
and if the first label fingerprint or the second label fingerprint corresponding to the preset data member is not found, prompting that the management operation of the data member fails.
Has the advantages that: compared with the prior art, the invention provides a tag cuckoo filter, which comprises a cuckoo hash table, wherein the cuckoo hash table comprises storage barrels for storing the number of the storage barrels, each data member corresponds to two tag fingerprints, and the two tag fingerprints are respectively stored in the two storage barrels; when the tag cuckoo filter receives a data member management operation, determining two candidate buckets and two tag fingerprints corresponding to preset data members based on an exclusive-or operation, and executing the data member management operation based on the two candidate buckets and the two tag fingerprints obtained through determination. According to the method, when two label fingerprints and two storage buckets are allocated for each data member operation, the candidate storage buckets corresponding to the data members are determined by the XOR operation based on the label fingerprints, the number of the storage buckets is not required to be the power of 2, and therefore the storage space overhead of each data member is reduced.
Drawings
Fig. 1 is an example of a tag cuckoo filter provided by the present invention.
Fig. 2 is a format diagram of the identification fingerprint provided by the present invention.
Fig. 3 is a flow chart of inserting data members in the tag cuckoo filter provided by the present invention when the number of storage buckets is not a power of 2.
FIG. 4 is an example of inserting data members in a tag cuckoo filter provided by the present invention.
FIG. 5 is another example of inserting data members in a tag cuckoo filter provided by the present invention.
Fig. 6 is a flow chart of inserting data members in the tag cuckoo filter provided by the present invention when the number of storage buckets is a power of 2.
Fig. 7 is a flowchart of searching for data members in the tag cuckoo filter provided by the present invention.
Fig. 8 is an example of searching for data members in a tag cuckoo filter provided by the present invention.
Fig. 9 is a flow chart of deleting data members in the tag cuckoo filter provided by the present invention.
Fig. 10 is an example of deleting data members in a tag cuckoo filter provided by the present invention.
Detailed Description
The invention provides a tag cuckoo filter, which is further described in detail below by referring to the accompanying drawings and embodiments in order to make the purpose, technical scheme and effect of the invention clearer and more clear. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The inventor finds that member Membership Query (Membership Query) is one of key methods for many network applications and distributed systems (such as cooperative caching, packet processing, key value storage and data de-duplication), and three key requirements of low storage space overhead, fast Query and incremental update are required to be met. Currently, a Bloom Filter (Bloom Filter), a Standard Bloom Filter (Standard Bloom Filter), a Counting Bloom Filter (Counting Bloom Filter), a Cuckoo Filter (Cuckoo Filter), and the like are commonly used for member membership query.
A standard bloom filter represents a set of m elements (items) in n bits (i.e., a bitmap), i.e., each inserted element is mapped to k bits of the bitmap with k hash functions, the k bit values being set to 1. Mapping each inquired element to k bits of a bitmap by adopting the same k hash functions, and checking whether the k bit values are all 1; if all are 1, indicating that the element is in the set; otherwise, it indicates that the element is not in the set. The standard bloom filter is a spatially efficient randomized data structure with a low False Positive error Rate (i.e., the query result indicates that an element is in the set but the element is not actually in the set) for the query, but does not produce False negative errors (i.e., the element must not be in the set if the query result indicates that the element is not in the set). Standard bloom filters support element insertion and query operations, but do not support element deletion operations.
A counting bloom filter is a bloom filter that supports delete operations, i.e., n counters (counters) are used to represent m elements of a collection. When the element is inserted, mapping the element to k counters by adopting k hash functions, wherein the k counters are increased by 1; when an element is deleted, the k counter values are decremented by 1. When the elements are inquired, the same k Hash functions are adopted to map the elements to k counters, and whether the values of the k counters are all larger than 1 is checked; if all are greater than 1, indicating that the element is in the set; otherwise, it indicates that the element is not in the set. In practical application, the counter size is set to 4 bits, so that the counter overflow problem can be avoided. Therefore, counting bloom filters support fast incremental updates, but their memory overhead is high, 4 times that of standard bloom filters.
The brook bird filter is a space-efficient broomm filter supporting deletion operations, and the storage space overhead of counting the broomm filter is obviously reduced and even lower than that of a standard broomm filter. The Cuckoo filter is calculated by using a Cuckoo Hash Table (Cuckoo Hash Table) and a candidate Bucket index value based on exclusive-or operation (XOR), that is, a Fingerprint (Fingerprint) of each element is inserted or deleted or queried in two candidate buckets (buckets) of Hash mapping of the element, but not the element itself. However, the cuckoo filter has a problem that the storage space overhead of each data member dynamically varies with the number of elements because the exclusive-or operation of the cuckoo filter requires that the number of storage buckets must be a power of 2 (i.e., 2)bB is exponential) resulting in a 2-fold increase in worst-case memory overhead per data member.
From the above, the filter adopted in the present membership query cannot satisfy the requirement of membership query. Therefore, in the embodiment of the invention, the tag cuckoo filter comprises a cuckoo hash table, wherein the cuckoo hash table comprises a plurality of storage buckets, each data member corresponds to two tag fingerprints, and the two tag fingerprints are respectively stored in the two storage buckets; when the tag cuckoo filter receives a data member management operation, determining two candidate buckets and two tag fingerprints corresponding to preset data members based on an exclusive-or operation, and executing the data member management operation based on the two candidate buckets and the two tag fingerprints obtained through determination. The invention respectively configures two label fingerprints and two storage buckets for each data member, determines the candidate storage buckets corresponding to the data members by adopting XOR operation based on the label fingerprints, and does not require that the number of the storage buckets is required to be the power of 2, thereby reducing the storage space overhead of each data member.
The invention will be further explained by the description of the embodiments with reference to the drawings.
The present embodiment provides a tabu cuckoo filter, as shown in fig. 1, the tabu cuckoo filter includes a cuckoo hash table, wherein the cuckoo hash table is a compact cuckoo hash table. The tagebrowth filter allocates two tagfingerprints to preset data members, respectively maps the two tagfingerprints to two candidate buckets, and after determining the index of one candidate bucket of the two candidate buckets, may determine the index of the other candidate bucket based on an exclusive or operation. It can be understood that, when the preset data member is obtained, the index of the candidate bucket a corresponding to the preset data member can be calculated and obtained based on the preset data member, and then the index of the candidate bucket B corresponding to the preset data member can be determined by using an exclusive or operation, so as to determine two candidate buckets corresponding to the preset data member. In addition, the storage bucket number corresponding to the cuckoo hash table is not a power of 2, namely the storage bucket number is not 2bB is an index and b may be a positive integer. For example, the number of buckets may be different from 4,8,16, etc. Of course, it is worth noting that when the number m of storage buckets is a power of 2, the tag cuckoo filter in the present embodiment degenerates to an existing cuckoo filter, the tag fingerprint does not include a tag and there is only one fingerprint per preset data member.
Further, each bucket in the cuckoo hash table includes a specified number of storage locations, wherein a storage location is used to store a tag fingerprint of a data member, and each storage location stores a tag fingerprint of a data member. In addition, the specified number can be determined according to actual needs. For example, the specified number is 4, etc. When the specified number is 4, it means that each bucket in the cuckoo hash table contains four storage locations, that is, each bucket can store tag fingerprints of four data members.
Further, the Tag fingerprint includes a Tag (Tag) used to represent a relation between a candidate bucket index corresponding to the Tag fingerprint and a bucket number (i.e., the number of buckets), and a Tag fingerprint Remainder (Remainder) used to represent a fingerprint of the data member. In one specific implementation, as shown in fig. 2, the tag contains 1 bit, which indicates whether the candidate bucket index i storing the fingerprint of the tag is smaller than the total bucket number m, if i < m, the tag value of the fingerprint is set to 0, otherwise, the tag value of the fingerprint is set to 1. The tag fingerprint remainder comprises the remaining r bits of the tag fingerprint, the tag fingerprint remainders of the two tag fingerprints corresponding to each data member are the same, and the tags corresponding to the two tag fingerprints can be the same or different.
The data member management operation comprises one or more of an insert operation, a query operation and a delete operation. In this embodiment, the data member management operation may be an insert operation, a query operation, or a delete operation. It will be appreciated that the tag cuckoo filter supports insert operations, query operations, and delete operations. The following describes the insertion operation, the inquiry operation, and the deletion operation in a specific manner.
In one implementation manner of this embodiment, the data member management operation is an insert operation; when the number of buckets of the tabby cuckoo filter is not a power of 2, as shown in fig. 3, when the tabby cuckoo filter receives a data member management operation, determining two candidate buckets and two tab fingerprints corresponding to preset data members based on an exclusive or operation, and executing the data member management operation based on the two candidate buckets and the two tab fingerprints obtained by the determination specifically includes:
a10, determining a label fingerprint remainder corresponding to the preset data member and a first candidate bucket;
a20, determining a second candidate bucket corresponding to the preset data member by adopting an exclusive OR operation based on the label fingerprint remainder and the first candidate bucket;
a30, determining a first label corresponding to a preset data member according to the first candidate bucket, and determining a second label corresponding to the preset data member according to the second candidate bucket;
a40, determining a first label fingerprint corresponding to a preset data member according to the first label and the label fingerprint remainder, and determining a second label fingerprint corresponding to the preset data member according to the second label and the label fingerprint remainder;
a50, respectively correcting the first candidate bucket and the second candidate bucket according to the storage bucket number, and taking the corrected first candidate bucket and the corrected second candidate bucket as a first candidate bucket and a second storage candidate bucket corresponding to a preset data member;
a60, detecting whether free storage positions exist in the first candidate bucket and the second candidate bucket, and if so, executing the step A70; if no free storage location exists, go to step A80;
a70, storing a candidate tag fingerprint of a preset data member in the idle storage position, wherein the candidate tag fingerprint is a fingerprint tag corresponding to a candidate storage bucket to which the idle storage position belongs;
a80, selecting a target candidate bucket from the first candidate bucket and the second candidate bucket, and taking the label fingerprint corresponding to the target candidate bucket as the label fingerprint corresponding to a preset data member;
a90, selecting a target label fingerprint in the target candidate storage bucket, and storing the label fingerprint corresponding to the preset data member in a storage position corresponding to the target label fingerprint;
a100, determining a reference storage bucket and a reference label fingerprint corresponding to the target label fingerprint by adopting an exclusive OR operation according to the target candidate storage bucket and the target label fingerprint; if the reference bucket has a free storage location, executing step a 110; if the reference bucket does not have a free storage location, executing step A120;
a110, storing the reference label fingerprint in the free storage position;
a120, selecting a target storage position in the reference storage bucket, and storing the reference label fingerprint in the target storage position;
a130, taking the label fingerprint corresponding to the target storage position as a target label fingerprint, and taking the reference storage bucket as a target candidate storage bucket; step a100 is continuously executed until the reference bucket has a free storage location or the execution times reaches a preset time threshold.
Specifically, in the step a10, the preset data member is a data member corresponding to the insert operation, and when the insert operation is received, the preset data member is acquired. After the preset data member is obtained, calculating a label fingerprint remainder corresponding to the preset data member and an index h of a first candidate storage bucket based on a hash function G (x)0(x) (ii) a The tag fingerprint remainder and the index of the first candidate bucket may be calculated as:
h0(x):rx=G(x) (1)
wherein r isxThe remainder of the tag fingerprint corresponding to a predetermined data member, h0(x) Index of the first candidate bucket, x is a preset data member, G (x) is a hash value corresponding to the preset data member x, wherein the lower numerical value in the hash value is rxHigh numerical value of h0(x) And ":" is a numerical value connector, said h0(x) Has a value range of [ 0.. multidot.M-1 ]]And M is the minimum power of 2 larger than M, and M is the number of storage barrels corresponding to the cuckoo hash table. For example, if M is 6, M is 8, and if M is 14, M is 16.
Further, in the step a20, after determining that the tag fingerprint remainder corresponding to the preset data member and the index of the first candidate bucket are obtained, an exclusive or operation may be used to determine a second candidate bucket corresponding to the preset data member. Wherein the index h of the second candidate bucket1(x) The calculation formula of (c) may be:
wherein r isxLabel finger corresponding to preset data memberRemainder of striae, h0(x) Is the index of the first candidate bucket, x is a predetermined data member,is an XOR operator, "mod" is a modulo operator, H (r)x) Is the hash value corresponding to the tag fingerprint remainder, hash value H (r)x) Has a value range of [ 0.. multidot.M-1 ]];h1(x) Is an index of a second candidate bucket, h1(x) Has a value range of [ 0.. multidot.M-1 ]]And M is the minimum power of 2 larger than M, and M is the number of storage barrels corresponding to the cuckoo hash table.
Further, in the step a30, after the index of the first candidate bucket and the index of the second candidate bucket are obtained, a first tag and a second tag corresponding to the preset data member may be determined according to the first candidate bucket and the second candidate bucket, where the first tag is determined according to the first candidate bucket, and the second tag is determined according to the first candidate bucket. In addition, as the relationship between the index of the candidate bucket corresponding to the tag fingerprint and the number of storage buckets is represented by the tags, the first tag may be determined according to the relationship between the index of the first candidate bucket and the number m of storage buckets, and the second tag may be determined according to the relationship between the index of the second candidate bucket and the number m of storage buckets.
In a specific implementation manner of this embodiment, the indexes h of the first candidate buckets may be respectively set0(x) And index h of the second candidate bucket1(x) Index h of first candidate buckets compared to bucket number m0(x) Greater than or equal to the number m of storage buckets, the first tagIs 0, otherwise, the index h of the first candidate buckets0(x) Less than m buckets, then the first tag 1, and similarly, the index h of a number of second candidate buckets1(x) Greater than or equal to the number of storage buckets m,then the second labelIs 0, otherwise, the index h of the first candidate buckets1(x) Less than the number m of storage buckets, then the second tagIs 1. Whereby said first labelThe calculation formula of (a) and the calculation formula of the second tag may be:
wherein the content of the first and second substances,is the first label and is a label of the first label,is a second label, h0(x) Index of the first candidate bucket, h1(x) Is an index of a second candidate bucket, rxIs the tag fingerprint remainder.
Further, in the step a50, after determining the first tag and the second tag, the index of the first candidate bucket and the index of the second candidate bucket may be modified based on the number of storage buckets, and the modified index of the first candidate bucket and the modified index of the second candidate bucket are used as the index of the first candidate bucket and the index of the second candidate bucket corresponding to the preset data member, where the modified index of the first candidate bucket and the modified index of the second candidate bucket are both [ 0.,. m-1 ]. In an implementation manner of this embodiment, the modification formulas of the first candidate bucket and the second candidate bucket are respectively:
wherein h is0(x) Index of the first candidate bucket, h1(x) For the index of the second candidate bucket, m is the bucket number and "mod" is the modulo operator.
Further, in the step a60, after the index of the first candidate bucket and the index of the second candidate bucket are obtained, the first candidate bucket and the second candidate bucket may be determined according to the index of the first candidate bucket and the index of the second candidate bucket. Upon determining a first candidate bucket and a second candidate bucket, the first candidate bucket h may be detected0(x) And the second candidate bucket h1(x) Whether there are free memory locations. Wherein said detecting said first candidate bucket h0(x) And the second candidate bucket h1(x) The process of whether there are free storage locations may be: searching sequentially for a first candidate bucket h0(x) And the second candidate bucket h1(x) Search for the first candidate bucket h0(x) And the second candidate bucket h1(x) If the storage position with the storage number being the preset number is searched, the storage position is judged to be an idle storage position. The preset value indicates that the storage location is a free storage location and does not store the fingerprint of the element, and the preset value is preset, for example, 0.
Further, in the step a70, when there are free storage locations, obtaining each free storage location, selecting one free storage location from all the obtained free storage locations, determining a candidate storage bucket corresponding to the free storage location, and storing a label fingerprint corresponding to the candidate storage bucket in the free storage location. Wherein the free storage location may be a first candidate bucket h0(x) May also be the second candidate bucket h1(x) In a memory location, e.g. when emptyThe idle storage position is a first candidate bucket h0(x) When the first tag fingerprint is stored in the idle storage position, storing the first tag fingerprint in the idle storage position; when the idle storage position is the second candidate bucket h1(x) And storing the second tag fingerprint in a free memory location. In addition, after the label fingerprints corresponding to the preset data members are stored in the idle storage positions, the insertion operation corresponding to the preset data members is completed.
Further, in step a80, when there is no free storage location, a target candidate bucket (i.e., the current bucket) is selected from the first candidate bucket and the second candidate bucket, and the tag fingerprint corresponding to the target candidate bucket is used as the selected tag fingerprint. For example, when the first candidate bucket is taken as the target candidate bucket, the first tagged fingerprint is the selected tagged fingerprint (i.e., new tagged fingerprint f); when the second candidate bucket is considered as the target candidate bucket, the second tagged fingerprint is the selected tagged fingerprint. In addition, before the target candidate bucket is selected, the number of data member moves is set to 0.
Further, in step a90, a storage location is randomly selected from the target candidate bucket, the tag fingerprint stored in the storage location is used as the target tag fingerprint (i.e. the old tag fingerprint), and the selected tag fingerprint corresponding to the preset data member is stored in the storage location corresponding to the target tag fingerprint. In addition, before a storage position is randomly selected from the target candidate storage bucket, the moving times are increased by 1, whether the moving times exceed a preset time threshold or not is judged, if the moving times exceed the preset time threshold, the insertion failure of the preset data member is judged, and if the moving times do not exceed the preset time threshold, the step A100 is executed. The preset number threshold may be determined according to an actual situation, for example, the preset number threshold is 500.
Further, in the step a100, after the target tag fingerprint is selected from the target candidate buckets, an exclusive or operation is used to determine a reference bucket (i.e., the reference bucket is used as the current bucket h) and a reference tag fingerprint (i.e., the reference tag fingerprint is used as the old fingerprint g') corresponding to the target tag fingerprint. Before determining the reference bucket, the index of the target candidate bucket needs to be adjusted according to the target tag corresponding to the target tag fingerprint. Wherein if the tag value of the target tag is 0, the index of the target candidate bucket remains unchanged, and if the tag value of the target tag is 1, the index of the target candidate bucket is adjusted to the index of the target candidate bucket plus m. For example, if the target tag g is 0 and the index of the target candidate bucket is i, the index of the adjusted target candidate bucket is still i; the target tag g is 1, and the index of the target candidate bucket is i, so the index of the adjusted target candidate bucket is still i + m. In addition, after determining the index of the adjusted target candidate bucket, calculating the index of a reference bucket according to the target tag fingerprint residue of the target tag fingerprint and the index of the adjusted target candidate bucket, wherein the reference bucket is another candidate bucket corresponding to the data member corresponding to the target tag fingerprint. The calculation formula of the index j of the reference bucket may be:
wherein i is an index of the adjusted target candidate bucket, r is a target tag fingerprint remainder, h (r) is a hash value corresponding to the target tag fingerprint remainder, and h (r) has a value range of [0],Is an exclusive or operator and "mod" is a modulo operator.
Further, after the buckets are referred to, the reference label fingerprints corresponding to the reference buckets are calculated according to the target label fingerprint remainders of the target label fingerprints and the indexes of the reference buckets. Wherein, the calculation process of the reference tag fingerprint may be: checking whether the index j of the reference bucket is less than m, and if j < m, setting the tag value of the reference tag fingerprint g' to 0; otherwise, the tag value of the reference tag fingerprint g' is set to 1, and the calculation formula is as follows:
wherein g' is the reference label fingerprint, r is the remainder of the target label fingerprint, j is the index of the reference bucket, and m is the number of buckets.
After the reference tag of the reference tag fingerprint is obtained, adjusting the index j of the test bucket, wherein the adjusted index j of the test bucket is in the range [ 0.·, m-1], and the adjustment formula of the index j of the test bucket is as follows:
j=j mod m (8)
where j is the index of the reference bucket, m is the number of buckets, and "mod" is the modulo operator.
Further, after the reference buckets are obtained, whether the reference buckets all have idle storage positions is detected, and if the reference buckets have idle storage positions, the step a110 is executed; if the reference bucket does not have a free storage location, step a120 is performed.
Further, in the step a110, the reference tag fingerprint is stored in the free storage location, and the inserting operation of the preset data member is completed.
Further, in step a120, a target storage location is selected from the reference bucket, and the reference tag fingerprint is stored in the target storage location; and taking the tag fingerprint corresponding to the target storage location as a reference tag fingerprint, taking the reference bucket as a target candidate bucket, and continuing to execute the step a100 until the reference bucket has a free storage location or the number of times of movement exceeds a preset number threshold. In addition, a target storage position is selected from the reference storage bucket, the moving times are increased by 1 by self before the reference label fingerprint is stored in the target storage position, so that the member moving times corresponding to the insertion operation of the preset member elements are counted, whether the insertion operation corresponding to the preset member elements is finished or not is measured, and the insertion operation is prevented from entering a dead loop.
The element insertion method of the plus-minus cuckoo filter is illustrated as follows:
example 1: when inserting an element, as shown in FIG. 4When the element x is used, firstly, the tag fingerprint remainder r of the element x is calculated by adopting the formulas (1) and (2)xAnd two candidate bucket indices h0(x) 6 and h1(x) 4; next, the two tag fingerprints for element x are calculated using equations (3), (4), and (5)Andsimultaneous adjustment of two candidate bucket indices h0(x) And h1(x) In the range [0, …,4]So that the adjusted candidate bucket index is h0(x) 1 and h1(x) 4; next, two candidate buckets h are searched0(x) 1 and h1(x) Finding that both of the two candidate buckets contain free storage locations; finally, randomly selecting a free candidate bucket h0(x) Store the corresponding tag fingerprint 1In the bucket.
Example 2: as shown in fig. 5, when an element y is inserted, first, the tag fingerprint remainder r of the element y is calculated using equations (1) and (2)yAnd two candidate bucket indices h0(y) 3 and h1(y) 6; next, two tag fingerprints for element y are calculated using equations (3), (4), and (5)Andsimultaneous adjustment of two candidate bucket indices h0(y) and h1(y) is in the range [0, …,4 ]]So that the adjusted candidate bucket index is h0(y) 3 and h1(y) 1; next, two candidate bucket indices h are searched0(y) 3 and h1(y) 1, finding that the two buckets do not contain free storage positions, and randomly selecting a candidate bucket h1(y) 1 from the storageRemoving old fingerprints from the bucketStoring new fingerprintsIn the bucket; then, based on the old fingerprintAnd current bucket index h1(a) Assume that the tag fingerprint is 1Has a tag value of 1 and a remainder of raAnother candidate bucket index h is calculated using equations (6), (7) and (8)0(a) 4 and its corresponding tag fingerprint(i.e. theThe tag value is set to 0 because h0(a)=4<m is 5); finally, search for candidate bucket h0(a) Finding that the candidate bucket contains a free storage location, stores the corresponding tag fingerprintIn the bucket.
Further, in one implementation of the present embodiment, upon receiving an insert operation, it may be determined whether the number of buckets, m, is a power of 2. If m is a power of 2, the tag cuckoo filter degenerates to an existing cuckoo filter, and if m is not a power of 2, the process of performing the insert operation according to the tag bird filter is performed. When the tag cuckoo filter is degraded into the existing cuckoo filter (i.e. when the number of storage buckets is a power of 2), as shown in fig. 6, the insertion operation may be implemented as follows:
b10 calculating the fingerprint f of the element xxAnd a first candidate bucket index h0(x) (ii) a Computing the fingerprint f of an element x using a hash function G (x)xAnd candidate bucket index h0(x) The calculation formula is as follows:
h0(x):fx=G(x) (9)
wherein, the lower value of the hash value G (x) is fxHigh numerical value of h0(x) And the term' is a numerical value connector. First candidate bucket index h0(x) Is in the range of [ 0., M-1]]Where M is a power of 2 and M is equal to M.
B20 calculating a second candidate bucket index h of element x1(x) In that respect Element x based fingerprint fxAnd a first candidate bucket index h0(x) Computing a second candidate bucket index h using an XOR operation1(x) The calculation formula is as follows:
wherein the hash value H (r)x) Is in the range of [ 0., M-1]],Is an XOR operator, "mod" is a modulo operator, and the second candidate bucket index h1(x) Is in the range of [ 0., M-1]]Where M is a power of 2 and M is equal to M.
B30, judging two candidate buckets h0(x) And h1(x) Whether at least one contains a free storage location; searching each candidate bucket h0(x) Or h1(x) Checking whether a free storage location is included; if at least one candidate bucket contains a free storage location, proceed to step B40; otherwise, step B50 is entered.
B40 storing fingerprint f of element xxIn one of the free candidate buckets h0(x) Or h1(x) In (3), the element x insertion ends.
B50, randomly selecting a candidate bucket h0(x) Or h1(x) Setting a new fingerprint f as the fingerprint fx(i.e., f ═ f)x). Setting target candidate bucket h to h0(x) Or h1(x) And the number of element moves is set to 0.
B60, increasing the element moving times by 1, and judging whether the element moving times exceed 500; if the number of times exceeds 500, entering step B70; otherwise, step B80 is entered.
B70, element x insertion fails, and element insertion ends.
B80, randomly selecting a storage location of the target candidate bucket h, moving the target fingerprint g of the storage location, and storing the new fingerprint f in the storage location.
B90, calculating another reference candidate bucket index of the target fingerprint g, and setting the reference candidate bucket index as the target candidate bucket h. Based on the target fingerprint g and the target candidate bucket index i (i.e. i ═ h), calculating a reference selected bucket index j by using an exclusive-or operation, wherein the calculation formula is as follows:
wherein the hash value H (g) ranges from [ 0., M-1 &],Is an exclusive or operator, "mod" is a modulo operator; the index j of the reference bucket is set to the target candidate bucket index h (i.e., h ═ j).
B100, judging whether the target candidate bucket h comprises at least one free storage position. If at least one free storage position is included, entering step B110; otherwise, step B120 is entered.
B110, storing the target fingerprint g in the target candidate bucket h, and ending element insertion.
B120, jumping to step B60, recursively removing other old fingerprints and inserting new fingerprints until the number of movements exceeds 500.
In an implementation manner of this embodiment, the data member management operation is a query operation; as shown in fig. 7, when the tagger filter receives a data member management operation, determining two candidate buckets and two tag fingerprints corresponding to preset data members based on an exclusive or operation, and executing the data member management operation based on the two candidate buckets and the two tag fingerprints specifically includes:
c10, determining a label fingerprint remainder corresponding to the preset data member and a first candidate bucket;
c20, determining a second candidate bucket corresponding to the preset data member by adopting an exclusive OR operation based on the label fingerprint remainder and the first candidate bucket;
c30, determining a first label corresponding to the preset data member according to the first candidate bucket, and determining a second label corresponding to the preset data member according to the second candidate bucket;
c40, determining a first label fingerprint corresponding to a preset data member according to the first label and the label fingerprint remainder, and determining a second label fingerprint corresponding to the preset data member according to the second label and the label fingerprint remainder;
c50, correcting the first candidate bucket and the second candidate bucket according to the number of the storage buckets, and taking the corrected first candidate bucket and the corrected second candidate bucket as a first candidate bucket and a second storage candidate bucket corresponding to preset data members;
c60, searching a first label fingerprint and a second label fingerprint corresponding to the preset data member in the first candidate bucket and the second candidate bucket respectively;
c70, if the first label fingerprint or the second label fingerprint corresponding to the preset data member is found, prompting the data member to successfully inquire;
and C80, if the first label fingerprint or the second label fingerprint corresponding to the preset data member is not found, prompting that the data member fails to inquire.
Specifically, the execution process of steps C10-C50 is the same as the execution process of steps a10-a50, and is not repeated here, and the description of steps a10-a50 may be referred to specifically.
Further, when the first candidate bucket h is obtained0(x) And a second candidate bucket h1(x) Thereafter, the first candidate bucket h is searched0(x) Whether the first tag fingerprint of the pre-data member x is matchedAnd a second candidate bucket h1(x) Whether the second tag fingerprint of the pre-data member x is matchedIf matching the first tag fingerprintOr second tag fingerprintIndicating that the preset data member element x is in the set, returning a query result to be True (True), and finishing the query of the preset data member; if not storing a matching first tag fingerprintOr second tag fingerprintAnd indicating that the preset data member element x is not in the set, returning a query result of False (False), and ending the element query.
An element query method of adding and subtracting a cuckoo filter is illustrated as follows:
as shown in FIG. 8, when querying element x, first, two candidate bucket indices h for element x are calculated using equations (1) and (2)0(x) 6 and h1(x) 4; secondly, two candidate buckets h are calculated by using the formulas (3), (4) and (5)0(x) And h1(x) Corresponding two tag fingerprintsAndadjusting two candidate bucket indices to h simultaneously0(x) 1 and h1(x) 4; next, two candidate buckets h are searched0(x) 1 and h1(x) Whether 4 matches the corresponding two tag fingerprintsAndfinding candidate buckets h0(x) Tag fingerprint with match 1Finally, the query indicates that element x is in the set, returning the query result as true.
In one implementation of the present embodiment, upon receiving an insert operation, it may be determined whether the number of buckets, m, is a power of 2. If m is a power of 2, the tag cuckoo filter degenerates to an existing cuckoo filter, and if m is not a power of 2, then the process of performing a query operation according to the tag bird filter is performed. When the tag cuckoo filter is degraded into the existing cuckoo filter, as shown in fig. 7, the query operation may be implemented by:
d10 calculating the fingerprint f of the element xxAnd a first candidate bucket index h0(x) In that respect Calculating the fingerprint f of the element x using equation (9)xAnd candidate bucket index h0(x) Wherein h is0(x) Is in the range of [ 0., M-1]]Where M is a power of 2 and M is equal to M.
D20 calculating a second candidate bucket index h for element x1(x) In that respect Calculating a second candidate bucket index h using equation (10)1(x) Wherein h is1(x) Is in the range of [ 0., M-1]]M is a power of 2 and M is equal to M.
D30 searching two candidate buckets h0(x) And h1(x) Whether or not to match a fingerprint fx(ii) a If matching the fingerprint fxProceeding to step D40; otherwise, step D50 is entered.
D40, indicating that the element x is in the set, returning the query result as true, and ending the element query.
D50, indicating that the element x is not in the set, returning the query result as false, and ending the element query.
In one implementation manner of this embodiment, the data member management operation is a delete operation; as shown in fig. 9, when the tagger filter receives a data member management operation, determining two candidate buckets and two tag fingerprints corresponding to preset data members based on an exclusive or operation, and executing the data member management operation based on the two candidate buckets and the two tag fingerprints specifically includes:
e10, determining a label fingerprint remainder corresponding to the preset data member and a first candidate storage bucket;
e20, determining a second candidate bucket corresponding to the preset data member by adopting an exclusive OR operation based on the label fingerprint remainder and the first candidate bucket;
e30, determining a first label corresponding to the preset data member according to the first candidate bucket, and determining a second label corresponding to the preset data member according to the second candidate bucket;
e40, determining a first label fingerprint corresponding to a preset data member according to the first label and the label fingerprint remainder, and determining a second label fingerprint corresponding to the preset data member according to the second label and the label fingerprint remainder;
e50, correcting the first candidate bucket and the second candidate bucket according to the number of the storage buckets, and taking the corrected first candidate bucket and the corrected second candidate bucket as a first candidate bucket and a second candidate bucket corresponding to a preset data member;
e60, respectively searching a first label fingerprint and a second label fingerprint corresponding to the preset data member in the first candidate bucket and the second candidate bucket;
e70, if the first label fingerprint or the second label fingerprint corresponding to the preset data member is found, deleting the label fingerprint corresponding to the found preset data member, wherein the label fingerprint is the first label fingerprint or the second label fingerprint;
e80, if the first label fingerprint or the second label fingerprint corresponding to the preset data member is not found, prompting that the deletion of the data member fails.
Specifically, the execution process of steps E10-E50 is the same as the execution process of steps a10-a50, and is not repeated here, and the description of steps a10-a50 may be referred to specifically.
Further, when the first candidate bucket h is obtained0(x) And a second candidate bucket h1(x) Thereafter, the first candidate bucket h is searched0(x) Whether the first tag fingerprint of the pre-data member x is matchedAnd a second candidate bucket h1(x) Whether the second tag fingerprint of the pre-data member x is matchedIf matching the first tag fingerprintOr second tag fingerprintDeleting the label fingerprint corresponding to the searched preset data member, and finishing the deletion of the data member; if not storing a matching first tag fingerprintOr second tag fingerprintThe table deletes the data member unsuccessfully and the element deletion ends.
The element deletion method of the plus-minus cuckoo filter is illustrated as follows:
as shown in FIG. 10, when an element y is deleted, first, two candidate bucket indices h for the element y are calculated using equations (1) and (2)0(y) 3 and h1(y) 6; secondly, two candidate buckets h are calculated by using the formulas (3), (4) and (5)0(y) and h1(y) two corresponding tag fingerprintsAndadjusting two candidate bucket indices to h simultaneously0(y) 3 and h1(y) 1; next, two candidate buckets h are searched0(y) 3 and h1(y) whether 1 matches the corresponding two tag fingerprintsAndassume two tag fingerprintsAndequal (i.e. equal)) Find candidate bucket h0(y) 3 and h1(y) there are matching tag fingerprints for 1Andfinally, a candidate bucket h is randomly selected1(y) 1, deleting a matching tag fingerprint from the bucketDeletion of element y was successful.
Further, the element deletion method of the tag cuckoo filter can ensure that the elements are correctly deletedFalse negative errors will not occur. If the fingerprints of two inserted elements are the same, the plus-minus cuckoo filter inserts the two fingerprints of the two elements into the filter. If one of the two elements is deleted, the other is still in the filter and therefore no false negative errors and possibly false positive errors occur. For example, in FIG. 10, element g is queried after element y is deleted, due to the tag fingerprint of element gAndtag fingerprint with element yAndequal (i.e. equal)And is) Finding a tag fingerprintIn candidate bucket h0(g) In 3, the query indicates that element g is in the set, returning the query result as true. When querying element y, fingerprint due to tag(i.e. the) Bucket h is still candidate0(y) 3, the query indicates that element y is in the set, and the returned query result is true; however, since element y was successfully deleted, the query results in a false positive error. Nonetheless, the element deletion side of the tag cuckoo filterThe method does not increase the false positive error rate, has low false positive error rate, and is the same as the false positive error rate of a standard bloom filter, a counting bloom filter, a cuckoo filter and the like.
In one implementation of the present embodiment, upon receiving an insert operation, it may be determined whether the number of buckets, m, is a power of 2. If m is a power of 2, the tag cuckoo filter degenerates to an existing cuckoo filter, and if m is not a power of 2, the process of performing the delete operation according to the tag bird filter is performed. When the tag cuckoo filter is degraded into the existing cuckoo filter, as shown in fig. 9, the deleting operation may be implemented by:
f10 calculating the fingerprint F of the element xxAnd a first candidate bucket index h0(x) In that respect Calculating the fingerprint f of the element x using equation (9)xAnd candidate bucket index h0(x) Wherein h is0(x) In the range of h1(x) Is in the range of [ 0., M-1]]Where M is a power of 2 and M is equal to M
F20 calculating a second candidate bucket index h for element x1(x) In that respect Calculating a second candidate bucket index h using equation (10)1(x) Wherein h is1(x) In the range of h1(x) Is in the range of [ 0., M-1]]M is a power of 2 and M is equal to M.
F30 searching two candidate buckets h0(x) And h1(x) Whether or not to match a fingerprint fx. If matching the fingerprint fxGo to step DF 0; otherwise, step F50 is entered.
F40, if the element x is in the set, deleting the label fingerprint F corresponding to the found preset data memberxAnd the element deletion ends.
F50, indicating that the element x is not in the set, returning the deletion result as false, and ending the element deletion.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A tag cuckoo filter is characterized by comprising a cuckoo hash table, wherein the cuckoo hash table comprises a plurality of storage buckets, each data member corresponds to two tag fingerprints, and the two tag fingerprints are stored in the two storage buckets respectively; when the tag cuckoo filter receives a data member management operation, determining two candidate buckets and two tag fingerprints corresponding to a preset data member based on an exclusive-or operation, and executing the data member management operation based on the two candidate buckets and the two tag fingerprints, wherein the preset data member is a data member corresponding to the data member management operation.
2. The tagger-cuckoo filter of claim 1, wherein the tag fingerprints include tags and tag fingerprint remainders, and the tag fingerprint remainders of two corresponding tag fingerprints of the predetermined data members are the same.
3. The tag cuckoo filter of claim 2, wherein the determining two candidate buckets and two tag fingerprints corresponding to preset data members based on an exclusive or operation specifically comprises:
determining a label fingerprint remainder corresponding to the preset data member and a first candidate bucket;
determining a second candidate bucket corresponding to the preset data member by adopting an exclusive-or operation based on the tag fingerprint remainder and the first candidate bucket;
determining a first label corresponding to a preset data member according to the first candidate bucket, and determining a second label corresponding to the preset data member according to the second candidate bucket;
determining a first label fingerprint corresponding to a preset data member according to the first label and the label fingerprint remainder, and determining a second label fingerprint corresponding to the preset data member according to the second label and the label fingerprint remainder;
and respectively correcting the first candidate bucket and the second candidate bucket according to the number of the storage buckets, and taking the corrected first candidate bucket and the corrected second candidate bucket as a first candidate bucket and a second storage candidate bucket corresponding to a preset data member.
4. The tagger blackbird filter of claim 3, wherein, when the data membership management operation is an insert operation, the performing the data membership management operation based on the two candidate buckets and the two tag fingerprints includes:
detecting whether there are free storage locations in the first candidate bucket and the second candidate bucket;
and if the idle storage position exists, storing the candidate tag fingerprint of the preset data member in the idle storage position, wherein the candidate tag fingerprint is a fingerprint tag corresponding to the candidate storage bucket to which the idle storage position belongs.
5. The tagger aventucko filter of claim 4, wherein when the data membership management operation is an insert operation, the performing the data membership management operation based on the determined two candidate buckets and the two tag fingerprints comprises:
if no idle storage position exists, selecting a target candidate storage bucket from the first candidate storage bucket and the second candidate storage bucket, and taking the label fingerprint corresponding to the target candidate storage bucket as the label fingerprint corresponding to a preset data member;
selecting a target label fingerprint in the target candidate storage bucket, and storing the label fingerprint corresponding to the preset data member in a storage position corresponding to the target label fingerprint;
determining a reference storage bucket and a reference label fingerprint corresponding to the target label fingerprint by adopting an exclusive OR operation according to the target candidate storage bucket and the target label fingerprint;
if the reference bucket has a free storage location, the reference tag fingerprint is stored in the free storage location.
6. The tagger aventucko filter of claim 5, wherein when the data membership management operation is an insert operation, the performing the data membership management operation based on the determined two candidate buckets and the two tag fingerprints comprises:
if the reference storage bucket does not have an idle storage position, selecting a target storage position in the reference storage bucket, and storing the reference label fingerprint in the target storage position;
taking the label fingerprint corresponding to the target storage position as a target label fingerprint, and taking the reference storage bucket as a target candidate storage bucket;
and continuing to execute the step of determining the reference storage bucket and the reference label fingerprint corresponding to the target label fingerprint by adopting an exclusive OR operation according to the target candidate storage bucket and the target label fingerprint until the reference storage bucket has an idle storage position or the execution times reaches a preset time threshold.
7. The tagger filter of claim 5 or 6, wherein the determining, according to the target candidate bucket and the target tag fingerprint, the reference bucket and the reference tag fingerprint corresponding to the target tag fingerprint by using an exclusive-or operation specifically comprises:
obtaining a target tag of the target tag fingerprint and updating the target candidate bucket based on the target tag;
determining a comparison storage bucket corresponding to the target label fingerprint by adopting exclusive OR operation based on the updated target candidate storage bucket and the label fingerprint remainder of the target label fingerprint;
and determining a reference label fingerprint corresponding to the target label fingerprint based on the comparison storage bucket and the label fingerprint remainder of the target label fingerprint, and correcting the comparison storage bucket based on the number of the storage buckets to obtain the reference storage bucket corresponding to the target label fingerprint.
8. The tagger-brook filter of claim 3, wherein, when the data membership management operation is a query operation or a delete operation, the performing the data membership management operation based on the two candidate buckets and the two tag fingerprints specifically comprises:
searching a first label fingerprint and a second label fingerprint corresponding to the preset data member in the first candidate bucket and the second candidate bucket respectively;
and if the first label fingerprint or the second label fingerprint corresponding to the preset data member is found, executing the data member management operation on the fingerprint.
9. The tagged cuckoo filter of claim 8, wherein performing the data membership management operation on the fingerprint specifically comprises:
when the data member management operation is a query operation, prompting the data member to successfully query;
and when the data member management operation is a deletion operation, deleting the label fingerprint corresponding to the found preset data member, wherein the label fingerprint is a first label fingerprint or a second label fingerprint.
10. The tagger-brook filter of claim 8, wherein, when the data membership management operation is a query operation or a delete operation, the performing the data membership management operation based on the two candidate buckets and the two tag fingerprints includes:
and if the first label fingerprint or the second label fingerprint corresponding to the preset data member is not found, prompting that the management operation of the data member fails.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010360757.7A CN111552693B (en) | 2020-04-30 | 2020-04-30 | Tag cuckoo filter |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010360757.7A CN111552693B (en) | 2020-04-30 | 2020-04-30 | Tag cuckoo filter |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111552693A true CN111552693A (en) | 2020-08-18 |
CN111552693B CN111552693B (en) | 2023-04-07 |
Family
ID=72003379
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010360757.7A Active CN111552693B (en) | 2020-04-30 | 2020-04-30 | Tag cuckoo filter |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111552693B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112148928A (en) * | 2020-09-18 | 2020-12-29 | 鹏城实验室 | Cuckoo filter based on fingerprint family |
CN112632337A (en) * | 2020-12-28 | 2021-04-09 | 南方科技大学 | Element management method applied to firework filter and firework filter |
CN113360516A (en) * | 2021-08-11 | 2021-09-07 | 成都信息工程大学 | Set member management method based on first-in first-out and minimum active number strategy |
CN113535706A (en) * | 2021-08-03 | 2021-10-22 | 重庆赛渝深科技有限公司 | Two-stage cuckoo filter and repeated data deleting method based on two-stage cuckoo filter |
CN113641681A (en) * | 2021-10-13 | 2021-11-12 | 南京大数据集团有限公司 | Space self-adaptive mass data query method |
CN116467307A (en) * | 2023-03-29 | 2023-07-21 | 济南大学 | Design method and system for cuckoo filter for reducing false positive rate |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101655861A (en) * | 2009-09-08 | 2010-02-24 | 中国科学院计算技术研究所 | Hashing method based on double-counting bloom filter and hashing device |
CN105630955A (en) * | 2015-12-24 | 2016-06-01 | 华中科技大学 | Method for efficiently managing members of dynamic data set |
CN107515901A (en) * | 2017-07-24 | 2017-12-26 | 中国科学院信息工程研究所 | A kind of chain type daily record storage organization and its Hash Index Structure, data manipulation method and server, medium |
CN109815234A (en) * | 2018-12-29 | 2019-05-28 | 杭州中科先进技术研究院有限公司 | A kind of multiple cuckoo filter under streaming computing model |
CN110046164A (en) * | 2019-04-16 | 2019-07-23 | 中国人民解放军国防科技大学 | Index independent grain distribution filter, consistency grain distribution filter and operation method |
CN110222088A (en) * | 2019-05-20 | 2019-09-10 | 华中科技大学 | Data approximation set representation method and system based on insertion position selection |
-
2020
- 2020-04-30 CN CN202010360757.7A patent/CN111552693B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101655861A (en) * | 2009-09-08 | 2010-02-24 | 中国科学院计算技术研究所 | Hashing method based on double-counting bloom filter and hashing device |
CN105630955A (en) * | 2015-12-24 | 2016-06-01 | 华中科技大学 | Method for efficiently managing members of dynamic data set |
CN107515901A (en) * | 2017-07-24 | 2017-12-26 | 中国科学院信息工程研究所 | A kind of chain type daily record storage organization and its Hash Index Structure, data manipulation method and server, medium |
CN109815234A (en) * | 2018-12-29 | 2019-05-28 | 杭州中科先进技术研究院有限公司 | A kind of multiple cuckoo filter under streaming computing model |
CN110046164A (en) * | 2019-04-16 | 2019-07-23 | 中国人民解放军国防科技大学 | Index independent grain distribution filter, consistency grain distribution filter and operation method |
CN110222088A (en) * | 2019-05-20 | 2019-09-10 | 华中科技大学 | Data approximation set representation method and system based on insertion position selection |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112148928A (en) * | 2020-09-18 | 2020-12-29 | 鹏城实验室 | Cuckoo filter based on fingerprint family |
CN112148928B (en) * | 2020-09-18 | 2024-02-20 | 鹏城实验室 | Cuckoo filter based on fingerprint family |
CN112632337A (en) * | 2020-12-28 | 2021-04-09 | 南方科技大学 | Element management method applied to firework filter and firework filter |
CN112632337B (en) * | 2020-12-28 | 2023-12-22 | 南方科技大学 | Element management method applied to firework filter and firework filter |
CN113535706A (en) * | 2021-08-03 | 2021-10-22 | 重庆赛渝深科技有限公司 | Two-stage cuckoo filter and repeated data deleting method based on two-stage cuckoo filter |
CN113535706B (en) * | 2021-08-03 | 2023-05-23 | 佛山赛思禅科技有限公司 | Two-stage cuckoo filter and repeated data deleting method based on two-stage cuckoo filter |
CN113360516A (en) * | 2021-08-11 | 2021-09-07 | 成都信息工程大学 | Set member management method based on first-in first-out and minimum active number strategy |
CN113360516B (en) * | 2021-08-11 | 2021-11-26 | 成都信息工程大学 | Collection member management method |
CN113641681A (en) * | 2021-10-13 | 2021-11-12 | 南京大数据集团有限公司 | Space self-adaptive mass data query method |
CN116467307A (en) * | 2023-03-29 | 2023-07-21 | 济南大学 | Design method and system for cuckoo filter for reducing false positive rate |
CN116467307B (en) * | 2023-03-29 | 2024-02-23 | 济南大学 | Design method and system for cuckoo filter for reducing false positive rate |
Also Published As
Publication number | Publication date |
---|---|
CN111552693B (en) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111552693B (en) | Tag cuckoo filter | |
CN111552692B (en) | Plus-minus cuckoo filter | |
CN112148928B (en) | Cuckoo filter based on fingerprint family | |
CN108446407B (en) | Database auditing method and device based on block chain | |
US10013312B2 (en) | Method and system for a safe archiving of data | |
WO2010135082A1 (en) | Localized weak bit assignment | |
US8190591B2 (en) | Bit string searching apparatus, searching method, and program | |
US20130151562A1 (en) | Method of calculating feature-amount of digital sequence, and apparatus for calculating feature-amount of digital sequence | |
CN111247518A (en) | Database sharding | |
US11777983B2 (en) | Systems and methods for rapidly generating security ratings | |
CN110245028B (en) | Message storage method, device, computer equipment and storage medium of IoT-MQ | |
CN109189759B (en) | Data reading method, data query method, device and equipment in KV storage system | |
CN113867627B (en) | Storage system performance optimization method and system | |
EP3522040A1 (en) | Method and device for file storage | |
CN111291002A (en) | File account checking method and device, computer equipment and storage medium | |
CN113392089B (en) | Database index optimization method and readable storage medium | |
CN105183383A (en) | Recombination method for irrelevant mirror images of file system | |
CN109344163B (en) | Data verification method and device and computer readable medium | |
WO2011137684A1 (en) | Search method and device based on information records of embedded system | |
WO2011073680A1 (en) | Improvements relating to hash tables | |
CN111858606A (en) | Data processing method and device and electronic equipment | |
CN115391355A (en) | Data processing method, device, equipment and storage medium | |
CN112632337B (en) | Element management method applied to firework filter and firework filter | |
CN102591941B (en) | Analysis method and analysis device for SQLite idle struct nodes | |
CN110413617B (en) | Method for dynamically adjusting hash table group according to size of data volume |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |