CN111552693A - Tag cuckoo filter - Google Patents

Tag cuckoo filter Download PDF

Info

Publication number
CN111552693A
CN111552693A CN202010360757.7A CN202010360757A CN111552693A CN 111552693 A CN111552693 A CN 111552693A CN 202010360757 A CN202010360757 A CN 202010360757A CN 111552693 A CN111552693 A CN 111552693A
Authority
CN
China
Prior art keywords
candidate
fingerprint
bucket
tag
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010360757.7A
Other languages
Chinese (zh)
Other versions
CN111552693B (en
Inventor
黄昆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest University of Science and Technology
Peng Cheng Laboratory
Original Assignee
Southwest University of Science and Technology
Peng Cheng Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest University of Science and Technology, Peng Cheng Laboratory filed Critical Southwest University of Science and Technology
Priority to CN202010360757.7A priority Critical patent/CN111552693B/en
Publication of CN111552693A publication Critical patent/CN111552693A/en
Application granted granted Critical
Publication of CN111552693B publication Critical patent/CN111552693B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/2443Stored procedures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Collating Specific Patterns (AREA)

Abstract

The invention discloses a tag cuckoo filter, which comprises a cuckoo hash table, wherein the cuckoo hash table comprises a plurality of storage barrels, each data member corresponds to two tag fingerprints, and the two tag fingerprints are respectively stored in the two storage barrels; when the tag cuckoo filter receives a data member management operation, determining two candidate buckets and two tag fingerprints corresponding to preset data members based on an exclusive-or operation, and executing the data member management operation based on the two candidate buckets and the two tag fingerprints obtained through determination. The invention respectively configures two label fingerprints and two storage buckets for each data member, determines the candidate storage buckets corresponding to the data members by adopting XOR operation based on the label fingerprints, and does not require that the number of the storage buckets is required to be the power of 2, thereby reducing the storage space overhead of each data member.

Description

Tag cuckoo filter
Technical Field
The invention relates to the technical field of computer information representation and information retrieval, in particular to a tag cuckoo filter.
Background
Member Membership Query (Membership Query) is one of the key methods for many network applications and distributed systems (e.g., cooperative caching, packet processing, key value storage and deduplication), and is required to satisfy the requirements of low storage space overhead,Three key requirements, namely quick query and incremental update. Currently, member membership query generally adopts a Bloom Filter (Bloom Filter), a Standard Bloom Filter (Standard Bloom Filter), a counting Bloom Filter (counting Bloom Filter), a Cuckoo Filter (Cuckoo Filter), and the like, but the Bloom Filter (Bloom Filter) and its variants are difficult to simultaneously satisfy the above three key requirements. For example, a standard bloom filter supports element insertion and query operations, but does not support element deletion operations. Counting bloom filters are one type of bloom filter that supports delete operations, but their storage space overhead is high. The brook bird filter is a space-efficient broomm filter supporting deletion operations, and the storage space overhead of counting the broomm filter is obviously reduced and even lower than that of a standard broomm filter. However, the existing cuckoo filter has the problem that the storage space overhead of each data member dynamically changes along with the number of elements, because the exclusive or operation of the cuckoo filter requires that the number of storage buckets is required to be a power of 2 (namely 2)bB is exponential) resulting in a 2-fold increase in worst-case memory overhead per data member.
Thus, the prior art has yet to be improved and enhanced.
Disclosure of Invention
The invention aims to solve the technical problem of providing a tag cuckoo filter aiming at the defects of the prior art.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
a tag cuckoo filter comprises a cuckoo hash table, wherein the cuckoo hash table comprises a plurality of storage buckets, each data member corresponds to two tag fingerprints, and the two tag fingerprints are stored in the two storage buckets respectively; when the tag cuckoo filter receives a data member management operation, determining two candidate buckets and two tag fingerprints corresponding to a preset data member based on an exclusive-or operation, and executing the data member management operation based on the two candidate buckets and the two tag fingerprints, wherein the preset data member is a data member corresponding to the data member management operation.
The tag cuckoo filter is characterized in that the tag fingerprints comprise tags and tag fingerprint remainders, and the tag fingerprint remainders of the two tag fingerprints corresponding to the preset data members are the same.
The tag cuckoo filter, wherein, the determining of two candidate buckets and two tag fingerprints corresponding to preset data members based on an exclusive or operation specifically includes:
determining a label fingerprint remainder corresponding to the preset data member and a first candidate bucket;
determining a second candidate bucket corresponding to the preset data member by adopting an exclusive-or operation based on the tag fingerprint remainder and the first candidate bucket;
determining a first label corresponding to a preset data member according to the first candidate bucket, and determining a second label corresponding to the preset data member according to the second candidate bucket;
determining a first label fingerprint corresponding to a preset data member according to the first label and the label fingerprint remainder, and determining a second label fingerprint corresponding to the preset data member according to the second label and the label fingerprint remainder;
and respectively correcting the first candidate bucket and the second candidate bucket according to the number of the storage buckets, and taking the corrected first candidate bucket and the corrected second candidate bucket as a first candidate bucket and a second storage candidate bucket corresponding to a preset data member.
The tag cuckoo filter, wherein, when the data member management operation is an insert operation, the performing the data member management operation based on the two candidate buckets and the two tag fingerprints specifically includes:
detecting whether there are free storage locations in the first candidate bucket and the second candidate bucket;
and if the idle storage position exists, storing the candidate tag fingerprint of the preset data member in the idle storage position, wherein the candidate tag fingerprint is a fingerprint tag corresponding to the candidate storage bucket to which the idle storage position belongs.
The tag cuckoo filter, wherein, when the data membership management operation is an insert operation, the performing the data membership management operation based on the two candidate buckets and the two tag fingerprints includes:
if no idle storage position exists, selecting a target candidate storage bucket from the first candidate storage bucket and the second candidate storage bucket, and taking the label fingerprint corresponding to the target candidate storage bucket as the label fingerprint corresponding to a preset data member;
selecting a target label fingerprint in the target candidate storage bucket, and storing the label fingerprint corresponding to the preset data member in a storage position corresponding to the target label fingerprint;
determining a reference storage bucket and a reference label fingerprint corresponding to the target label fingerprint by adopting an exclusive OR operation according to the target candidate storage bucket and the target label fingerprint;
if the reference bucket has a free storage location, the reference tag fingerprint is stored in the free storage location.
The tag cuckoo filter, wherein, when the data membership management operation is an insert operation, the performing the data membership management operation based on the two candidate buckets and the two tag fingerprints includes:
if the reference storage bucket does not have an idle storage position, selecting a target storage position in the reference storage bucket, and storing the reference label fingerprint in the target storage position;
taking the label fingerprint corresponding to the target storage position as a target label fingerprint, and taking the reference storage bucket as a target candidate storage bucket;
and continuing to execute the step of determining the reference storage bucket and the reference label fingerprint corresponding to the target label fingerprint by adopting an exclusive OR operation according to the target candidate storage bucket and the target label fingerprint until the reference storage bucket has an idle storage position or the execution times reaches a preset time threshold.
The tag cuckoo filter, wherein the determining, according to the target candidate bucket and the target tag fingerprint, a reference bucket and a reference tag fingerprint corresponding to the target tag fingerprint by using an exclusive or operation specifically includes:
obtaining a target tag of the target tag fingerprint and updating the target candidate bucket based on the target tag;
determining a comparison storage bucket corresponding to the target label fingerprint by adopting exclusive OR operation based on the updated target candidate storage bucket and the label fingerprint remainder of the target label fingerprint;
and determining a reference label fingerprint corresponding to the target label fingerprint based on the comparison storage bucket and the label fingerprint remainder of the target label fingerprint, and correcting the comparison storage bucket based on the number of the storage buckets to obtain the reference storage bucket corresponding to the target label fingerprint.
The tag cuckoo filter, wherein, when the data member management operation is an inquiry operation or a deletion operation, the executing the data member management operation based on the two candidate buckets and the two tag fingerprints specifically includes:
searching a first label fingerprint and a second label fingerprint corresponding to the preset data member in the first candidate bucket and the second candidate bucket respectively;
and if the first label fingerprint or the second label fingerprint corresponding to the preset data member is found, executing the data member management operation on the fingerprint.
The tag cuckoo filter, wherein the performing the data member management operation on the fingerprint specifically includes:
when the data member management operation is a query operation, prompting the data member to successfully query;
and when the data member management operation is a deletion operation, deleting the label fingerprint corresponding to the found preset data member, wherein the label fingerprint is a first label fingerprint or a second label fingerprint.
The tag cuckoo filter, wherein, when the data member management operation is an inquiry operation or a deletion operation, the executing the data member management operation based on the two candidate buckets and the two tag fingerprints specifically includes:
and if the first label fingerprint or the second label fingerprint corresponding to the preset data member is not found, prompting that the management operation of the data member fails.
Has the advantages that: compared with the prior art, the invention provides a tag cuckoo filter, which comprises a cuckoo hash table, wherein the cuckoo hash table comprises storage barrels for storing the number of the storage barrels, each data member corresponds to two tag fingerprints, and the two tag fingerprints are respectively stored in the two storage barrels; when the tag cuckoo filter receives a data member management operation, determining two candidate buckets and two tag fingerprints corresponding to preset data members based on an exclusive-or operation, and executing the data member management operation based on the two candidate buckets and the two tag fingerprints obtained through determination. According to the method, when two label fingerprints and two storage buckets are allocated for each data member operation, the candidate storage buckets corresponding to the data members are determined by the XOR operation based on the label fingerprints, the number of the storage buckets is not required to be the power of 2, and therefore the storage space overhead of each data member is reduced.
Drawings
Fig. 1 is an example of a tag cuckoo filter provided by the present invention.
Fig. 2 is a format diagram of the identification fingerprint provided by the present invention.
Fig. 3 is a flow chart of inserting data members in the tag cuckoo filter provided by the present invention when the number of storage buckets is not a power of 2.
FIG. 4 is an example of inserting data members in a tag cuckoo filter provided by the present invention.
FIG. 5 is another example of inserting data members in a tag cuckoo filter provided by the present invention.
Fig. 6 is a flow chart of inserting data members in the tag cuckoo filter provided by the present invention when the number of storage buckets is a power of 2.
Fig. 7 is a flowchart of searching for data members in the tag cuckoo filter provided by the present invention.
Fig. 8 is an example of searching for data members in a tag cuckoo filter provided by the present invention.
Fig. 9 is a flow chart of deleting data members in the tag cuckoo filter provided by the present invention.
Fig. 10 is an example of deleting data members in a tag cuckoo filter provided by the present invention.
Detailed Description
The invention provides a tag cuckoo filter, which is further described in detail below by referring to the accompanying drawings and embodiments in order to make the purpose, technical scheme and effect of the invention clearer and more clear. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The inventor finds that member Membership Query (Membership Query) is one of key methods for many network applications and distributed systems (such as cooperative caching, packet processing, key value storage and data de-duplication), and three key requirements of low storage space overhead, fast Query and incremental update are required to be met. Currently, a Bloom Filter (Bloom Filter), a Standard Bloom Filter (Standard Bloom Filter), a Counting Bloom Filter (Counting Bloom Filter), a Cuckoo Filter (Cuckoo Filter), and the like are commonly used for member membership query.
A standard bloom filter represents a set of m elements (items) in n bits (i.e., a bitmap), i.e., each inserted element is mapped to k bits of the bitmap with k hash functions, the k bit values being set to 1. Mapping each inquired element to k bits of a bitmap by adopting the same k hash functions, and checking whether the k bit values are all 1; if all are 1, indicating that the element is in the set; otherwise, it indicates that the element is not in the set. The standard bloom filter is a spatially efficient randomized data structure with a low False Positive error Rate (i.e., the query result indicates that an element is in the set but the element is not actually in the set) for the query, but does not produce False negative errors (i.e., the element must not be in the set if the query result indicates that the element is not in the set). Standard bloom filters support element insertion and query operations, but do not support element deletion operations.
A counting bloom filter is a bloom filter that supports delete operations, i.e., n counters (counters) are used to represent m elements of a collection. When the element is inserted, mapping the element to k counters by adopting k hash functions, wherein the k counters are increased by 1; when an element is deleted, the k counter values are decremented by 1. When the elements are inquired, the same k Hash functions are adopted to map the elements to k counters, and whether the values of the k counters are all larger than 1 is checked; if all are greater than 1, indicating that the element is in the set; otherwise, it indicates that the element is not in the set. In practical application, the counter size is set to 4 bits, so that the counter overflow problem can be avoided. Therefore, counting bloom filters support fast incremental updates, but their memory overhead is high, 4 times that of standard bloom filters.
The brook bird filter is a space-efficient broomm filter supporting deletion operations, and the storage space overhead of counting the broomm filter is obviously reduced and even lower than that of a standard broomm filter. The Cuckoo filter is calculated by using a Cuckoo Hash Table (Cuckoo Hash Table) and a candidate Bucket index value based on exclusive-or operation (XOR), that is, a Fingerprint (Fingerprint) of each element is inserted or deleted or queried in two candidate buckets (buckets) of Hash mapping of the element, but not the element itself. However, the cuckoo filter has a problem that the storage space overhead of each data member dynamically varies with the number of elements because the exclusive-or operation of the cuckoo filter requires that the number of storage buckets must be a power of 2 (i.e., 2)bB is exponential) resulting in a 2-fold increase in worst-case memory overhead per data member.
From the above, the filter adopted in the present membership query cannot satisfy the requirement of membership query. Therefore, in the embodiment of the invention, the tag cuckoo filter comprises a cuckoo hash table, wherein the cuckoo hash table comprises a plurality of storage buckets, each data member corresponds to two tag fingerprints, and the two tag fingerprints are respectively stored in the two storage buckets; when the tag cuckoo filter receives a data member management operation, determining two candidate buckets and two tag fingerprints corresponding to preset data members based on an exclusive-or operation, and executing the data member management operation based on the two candidate buckets and the two tag fingerprints obtained through determination. The invention respectively configures two label fingerprints and two storage buckets for each data member, determines the candidate storage buckets corresponding to the data members by adopting XOR operation based on the label fingerprints, and does not require that the number of the storage buckets is required to be the power of 2, thereby reducing the storage space overhead of each data member.
The invention will be further explained by the description of the embodiments with reference to the drawings.
The present embodiment provides a tabu cuckoo filter, as shown in fig. 1, the tabu cuckoo filter includes a cuckoo hash table, wherein the cuckoo hash table is a compact cuckoo hash table. The tagebrowth filter allocates two tagfingerprints to preset data members, respectively maps the two tagfingerprints to two candidate buckets, and after determining the index of one candidate bucket of the two candidate buckets, may determine the index of the other candidate bucket based on an exclusive or operation. It can be understood that, when the preset data member is obtained, the index of the candidate bucket a corresponding to the preset data member can be calculated and obtained based on the preset data member, and then the index of the candidate bucket B corresponding to the preset data member can be determined by using an exclusive or operation, so as to determine two candidate buckets corresponding to the preset data member. In addition, the storage bucket number corresponding to the cuckoo hash table is not a power of 2, namely the storage bucket number is not 2bB is an index and b may be a positive integer. For example, the number of buckets may be different from 4,8,16, etc. Of course, it is worth noting that when the number m of storage buckets is a power of 2, the tag cuckoo filter in the present embodiment degenerates to an existing cuckoo filter, the tag fingerprint does not include a tag and there is only one fingerprint per preset data member.
Further, each bucket in the cuckoo hash table includes a specified number of storage locations, wherein a storage location is used to store a tag fingerprint of a data member, and each storage location stores a tag fingerprint of a data member. In addition, the specified number can be determined according to actual needs. For example, the specified number is 4, etc. When the specified number is 4, it means that each bucket in the cuckoo hash table contains four storage locations, that is, each bucket can store tag fingerprints of four data members.
Further, the Tag fingerprint includes a Tag (Tag) used to represent a relation between a candidate bucket index corresponding to the Tag fingerprint and a bucket number (i.e., the number of buckets), and a Tag fingerprint Remainder (Remainder) used to represent a fingerprint of the data member. In one specific implementation, as shown in fig. 2, the tag contains 1 bit, which indicates whether the candidate bucket index i storing the fingerprint of the tag is smaller than the total bucket number m, if i < m, the tag value of the fingerprint is set to 0, otherwise, the tag value of the fingerprint is set to 1. The tag fingerprint remainder comprises the remaining r bits of the tag fingerprint, the tag fingerprint remainders of the two tag fingerprints corresponding to each data member are the same, and the tags corresponding to the two tag fingerprints can be the same or different.
The data member management operation comprises one or more of an insert operation, a query operation and a delete operation. In this embodiment, the data member management operation may be an insert operation, a query operation, or a delete operation. It will be appreciated that the tag cuckoo filter supports insert operations, query operations, and delete operations. The following describes the insertion operation, the inquiry operation, and the deletion operation in a specific manner.
In one implementation manner of this embodiment, the data member management operation is an insert operation; when the number of buckets of the tabby cuckoo filter is not a power of 2, as shown in fig. 3, when the tabby cuckoo filter receives a data member management operation, determining two candidate buckets and two tab fingerprints corresponding to preset data members based on an exclusive or operation, and executing the data member management operation based on the two candidate buckets and the two tab fingerprints obtained by the determination specifically includes:
a10, determining a label fingerprint remainder corresponding to the preset data member and a first candidate bucket;
a20, determining a second candidate bucket corresponding to the preset data member by adopting an exclusive OR operation based on the label fingerprint remainder and the first candidate bucket;
a30, determining a first label corresponding to a preset data member according to the first candidate bucket, and determining a second label corresponding to the preset data member according to the second candidate bucket;
a40, determining a first label fingerprint corresponding to a preset data member according to the first label and the label fingerprint remainder, and determining a second label fingerprint corresponding to the preset data member according to the second label and the label fingerprint remainder;
a50, respectively correcting the first candidate bucket and the second candidate bucket according to the storage bucket number, and taking the corrected first candidate bucket and the corrected second candidate bucket as a first candidate bucket and a second storage candidate bucket corresponding to a preset data member;
a60, detecting whether free storage positions exist in the first candidate bucket and the second candidate bucket, and if so, executing the step A70; if no free storage location exists, go to step A80;
a70, storing a candidate tag fingerprint of a preset data member in the idle storage position, wherein the candidate tag fingerprint is a fingerprint tag corresponding to a candidate storage bucket to which the idle storage position belongs;
a80, selecting a target candidate bucket from the first candidate bucket and the second candidate bucket, and taking the label fingerprint corresponding to the target candidate bucket as the label fingerprint corresponding to a preset data member;
a90, selecting a target label fingerprint in the target candidate storage bucket, and storing the label fingerprint corresponding to the preset data member in a storage position corresponding to the target label fingerprint;
a100, determining a reference storage bucket and a reference label fingerprint corresponding to the target label fingerprint by adopting an exclusive OR operation according to the target candidate storage bucket and the target label fingerprint; if the reference bucket has a free storage location, executing step a 110; if the reference bucket does not have a free storage location, executing step A120;
a110, storing the reference label fingerprint in the free storage position;
a120, selecting a target storage position in the reference storage bucket, and storing the reference label fingerprint in the target storage position;
a130, taking the label fingerprint corresponding to the target storage position as a target label fingerprint, and taking the reference storage bucket as a target candidate storage bucket; step a100 is continuously executed until the reference bucket has a free storage location or the execution times reaches a preset time threshold.
Specifically, in the step a10, the preset data member is a data member corresponding to the insert operation, and when the insert operation is received, the preset data member is acquired. After the preset data member is obtained, calculating a label fingerprint remainder corresponding to the preset data member and an index h of a first candidate storage bucket based on a hash function G (x)0(x) (ii) a The tag fingerprint remainder and the index of the first candidate bucket may be calculated as:
h0(x):rx=G(x) (1)
wherein r isxThe remainder of the tag fingerprint corresponding to a predetermined data member, h0(x) Index of the first candidate bucket, x is a preset data member, G (x) is a hash value corresponding to the preset data member x, wherein the lower numerical value in the hash value is rxHigh numerical value of h0(x) And ":" is a numerical value connector, said h0(x) Has a value range of [ 0.. multidot.M-1 ]]And M is the minimum power of 2 larger than M, and M is the number of storage barrels corresponding to the cuckoo hash table. For example, if M is 6, M is 8, and if M is 14, M is 16.
Further, in the step a20, after determining that the tag fingerprint remainder corresponding to the preset data member and the index of the first candidate bucket are obtained, an exclusive or operation may be used to determine a second candidate bucket corresponding to the preset data member. Wherein the index h of the second candidate bucket1(x) The calculation formula of (c) may be:
Figure BDA0002474990970000101
wherein r isxLabel finger corresponding to preset data memberRemainder of striae, h0(x) Is the index of the first candidate bucket, x is a predetermined data member,
Figure BDA0002474990970000102
is an XOR operator, "mod" is a modulo operator, H (r)x) Is the hash value corresponding to the tag fingerprint remainder, hash value H (r)x) Has a value range of [ 0.. multidot.M-1 ]];h1(x) Is an index of a second candidate bucket, h1(x) Has a value range of [ 0.. multidot.M-1 ]]And M is the minimum power of 2 larger than M, and M is the number of storage barrels corresponding to the cuckoo hash table.
Further, in the step a30, after the index of the first candidate bucket and the index of the second candidate bucket are obtained, a first tag and a second tag corresponding to the preset data member may be determined according to the first candidate bucket and the second candidate bucket, where the first tag is determined according to the first candidate bucket, and the second tag is determined according to the first candidate bucket. In addition, as the relationship between the index of the candidate bucket corresponding to the tag fingerprint and the number of storage buckets is represented by the tags, the first tag may be determined according to the relationship between the index of the first candidate bucket and the number m of storage buckets, and the second tag may be determined according to the relationship between the index of the second candidate bucket and the number m of storage buckets.
In a specific implementation manner of this embodiment, the indexes h of the first candidate buckets may be respectively set0(x) And index h of the second candidate bucket1(x) Index h of first candidate buckets compared to bucket number m0(x) Greater than or equal to the number m of storage buckets, the first tag
Figure BDA0002474990970000111
Is 0, otherwise, the index h of the first candidate buckets0(x) Less than m buckets, then the first tag
Figure BDA0002474990970000112
Figure BDA0002474990970000112
1, and similarly, the index h of a number of second candidate buckets1(x) Greater than or equal to the number of storage buckets m,then the second label
Figure BDA0002474990970000113
Is 0, otherwise, the index h of the first candidate buckets1(x) Less than the number m of storage buckets, then the second tag
Figure BDA0002474990970000114
Is 1. Whereby said first label
Figure BDA0002474990970000115
The calculation formula of (a) and the calculation formula of the second tag may be:
Figure BDA0002474990970000116
Figure BDA0002474990970000117
wherein the content of the first and second substances,
Figure BDA0002474990970000118
is the first label and is a label of the first label,
Figure BDA0002474990970000119
is a second label, h0(x) Index of the first candidate bucket, h1(x) Is an index of a second candidate bucket, rxIs the tag fingerprint remainder.
Further, in the step a50, after determining the first tag and the second tag, the index of the first candidate bucket and the index of the second candidate bucket may be modified based on the number of storage buckets, and the modified index of the first candidate bucket and the modified index of the second candidate bucket are used as the index of the first candidate bucket and the index of the second candidate bucket corresponding to the preset data member, where the modified index of the first candidate bucket and the modified index of the second candidate bucket are both [ 0.,. m-1 ]. In an implementation manner of this embodiment, the modification formulas of the first candidate bucket and the second candidate bucket are respectively:
Figure BDA0002474990970000121
wherein h is0(x) Index of the first candidate bucket, h1(x) For the index of the second candidate bucket, m is the bucket number and "mod" is the modulo operator.
Further, in the step a60, after the index of the first candidate bucket and the index of the second candidate bucket are obtained, the first candidate bucket and the second candidate bucket may be determined according to the index of the first candidate bucket and the index of the second candidate bucket. Upon determining a first candidate bucket and a second candidate bucket, the first candidate bucket h may be detected0(x) And the second candidate bucket h1(x) Whether there are free memory locations. Wherein said detecting said first candidate bucket h0(x) And the second candidate bucket h1(x) The process of whether there are free storage locations may be: searching sequentially for a first candidate bucket h0(x) And the second candidate bucket h1(x) Search for the first candidate bucket h0(x) And the second candidate bucket h1(x) If the storage position with the storage number being the preset number is searched, the storage position is judged to be an idle storage position. The preset value indicates that the storage location is a free storage location and does not store the fingerprint of the element, and the preset value is preset, for example, 0.
Further, in the step a70, when there are free storage locations, obtaining each free storage location, selecting one free storage location from all the obtained free storage locations, determining a candidate storage bucket corresponding to the free storage location, and storing a label fingerprint corresponding to the candidate storage bucket in the free storage location. Wherein the free storage location may be a first candidate bucket h0(x) May also be the second candidate bucket h1(x) In a memory location, e.g. when emptyThe idle storage position is a first candidate bucket h0(x) When the first tag fingerprint is stored in the idle storage position, storing the first tag fingerprint in the idle storage position; when the idle storage position is the second candidate bucket h1(x) And storing the second tag fingerprint in a free memory location. In addition, after the label fingerprints corresponding to the preset data members are stored in the idle storage positions, the insertion operation corresponding to the preset data members is completed.
Further, in step a80, when there is no free storage location, a target candidate bucket (i.e., the current bucket) is selected from the first candidate bucket and the second candidate bucket, and the tag fingerprint corresponding to the target candidate bucket is used as the selected tag fingerprint. For example, when the first candidate bucket is taken as the target candidate bucket, the first tagged fingerprint is the selected tagged fingerprint (i.e., new tagged fingerprint f); when the second candidate bucket is considered as the target candidate bucket, the second tagged fingerprint is the selected tagged fingerprint. In addition, before the target candidate bucket is selected, the number of data member moves is set to 0.
Further, in step a90, a storage location is randomly selected from the target candidate bucket, the tag fingerprint stored in the storage location is used as the target tag fingerprint (i.e. the old tag fingerprint), and the selected tag fingerprint corresponding to the preset data member is stored in the storage location corresponding to the target tag fingerprint. In addition, before a storage position is randomly selected from the target candidate storage bucket, the moving times are increased by 1, whether the moving times exceed a preset time threshold or not is judged, if the moving times exceed the preset time threshold, the insertion failure of the preset data member is judged, and if the moving times do not exceed the preset time threshold, the step A100 is executed. The preset number threshold may be determined according to an actual situation, for example, the preset number threshold is 500.
Further, in the step a100, after the target tag fingerprint is selected from the target candidate buckets, an exclusive or operation is used to determine a reference bucket (i.e., the reference bucket is used as the current bucket h) and a reference tag fingerprint (i.e., the reference tag fingerprint is used as the old fingerprint g') corresponding to the target tag fingerprint. Before determining the reference bucket, the index of the target candidate bucket needs to be adjusted according to the target tag corresponding to the target tag fingerprint. Wherein if the tag value of the target tag is 0, the index of the target candidate bucket remains unchanged, and if the tag value of the target tag is 1, the index of the target candidate bucket is adjusted to the index of the target candidate bucket plus m. For example, if the target tag g is 0 and the index of the target candidate bucket is i, the index of the adjusted target candidate bucket is still i; the target tag g is 1, and the index of the target candidate bucket is i, so the index of the adjusted target candidate bucket is still i + m. In addition, after determining the index of the adjusted target candidate bucket, calculating the index of a reference bucket according to the target tag fingerprint residue of the target tag fingerprint and the index of the adjusted target candidate bucket, wherein the reference bucket is another candidate bucket corresponding to the data member corresponding to the target tag fingerprint. The calculation formula of the index j of the reference bucket may be:
Figure BDA0002474990970000131
wherein i is an index of the adjusted target candidate bucket, r is a target tag fingerprint remainder, h (r) is a hash value corresponding to the target tag fingerprint remainder, and h (r) has a value range of [0],
Figure BDA0002474990970000132
Is an exclusive or operator and "mod" is a modulo operator.
Further, after the buckets are referred to, the reference label fingerprints corresponding to the reference buckets are calculated according to the target label fingerprint remainders of the target label fingerprints and the indexes of the reference buckets. Wherein, the calculation process of the reference tag fingerprint may be: checking whether the index j of the reference bucket is less than m, and if j < m, setting the tag value of the reference tag fingerprint g' to 0; otherwise, the tag value of the reference tag fingerprint g' is set to 1, and the calculation formula is as follows:
Figure BDA0002474990970000141
wherein g' is the reference label fingerprint, r is the remainder of the target label fingerprint, j is the index of the reference bucket, and m is the number of buckets.
After the reference tag of the reference tag fingerprint is obtained, adjusting the index j of the test bucket, wherein the adjusted index j of the test bucket is in the range [ 0.·, m-1], and the adjustment formula of the index j of the test bucket is as follows:
j=j mod m (8)
where j is the index of the reference bucket, m is the number of buckets, and "mod" is the modulo operator.
Further, after the reference buckets are obtained, whether the reference buckets all have idle storage positions is detected, and if the reference buckets have idle storage positions, the step a110 is executed; if the reference bucket does not have a free storage location, step a120 is performed.
Further, in the step a110, the reference tag fingerprint is stored in the free storage location, and the inserting operation of the preset data member is completed.
Further, in step a120, a target storage location is selected from the reference bucket, and the reference tag fingerprint is stored in the target storage location; and taking the tag fingerprint corresponding to the target storage location as a reference tag fingerprint, taking the reference bucket as a target candidate bucket, and continuing to execute the step a100 until the reference bucket has a free storage location or the number of times of movement exceeds a preset number threshold. In addition, a target storage position is selected from the reference storage bucket, the moving times are increased by 1 by self before the reference label fingerprint is stored in the target storage position, so that the member moving times corresponding to the insertion operation of the preset member elements are counted, whether the insertion operation corresponding to the preset member elements is finished or not is measured, and the insertion operation is prevented from entering a dead loop.
The element insertion method of the plus-minus cuckoo filter is illustrated as follows:
example 1: when inserting an element, as shown in FIG. 4When the element x is used, firstly, the tag fingerprint remainder r of the element x is calculated by adopting the formulas (1) and (2)xAnd two candidate bucket indices h0(x) 6 and h1(x) 4; next, the two tag fingerprints for element x are calculated using equations (3), (4), and (5)
Figure BDA0002474990970000151
And
Figure BDA0002474990970000152
simultaneous adjustment of two candidate bucket indices h0(x) And h1(x) In the range [0, …,4]So that the adjusted candidate bucket index is h0(x) 1 and h1(x) 4; next, two candidate buckets h are searched0(x) 1 and h1(x) Finding that both of the two candidate buckets contain free storage locations; finally, randomly selecting a free candidate bucket h0(x) Store the corresponding tag fingerprint 1
Figure BDA0002474990970000153
In the bucket.
Example 2: as shown in fig. 5, when an element y is inserted, first, the tag fingerprint remainder r of the element y is calculated using equations (1) and (2)yAnd two candidate bucket indices h0(y) 3 and h1(y) 6; next, two tag fingerprints for element y are calculated using equations (3), (4), and (5)
Figure BDA0002474990970000154
And
Figure BDA0002474990970000155
simultaneous adjustment of two candidate bucket indices h0(y) and h1(y) is in the range [0, …,4 ]]So that the adjusted candidate bucket index is h0(y) 3 and h1(y) 1; next, two candidate bucket indices h are searched0(y) 3 and h1(y) 1, finding that the two buckets do not contain free storage positions, and randomly selecting a candidate bucket h1(y) 1 from the storageRemoving old fingerprints from the bucket
Figure BDA0002474990970000156
Storing new fingerprints
Figure BDA0002474990970000157
In the bucket; then, based on the old fingerprint
Figure BDA0002474990970000158
And current bucket index h1(a) Assume that the tag fingerprint is 1
Figure BDA0002474990970000159
Has a tag value of 1 and a remainder of raAnother candidate bucket index h is calculated using equations (6), (7) and (8)0(a) 4 and its corresponding tag fingerprint
Figure BDA00024749909700001510
(i.e. the
Figure BDA00024749909700001511
The tag value is set to 0 because h0(a)=4<m is 5); finally, search for candidate bucket h0(a) Finding that the candidate bucket contains a free storage location, stores the corresponding tag fingerprint
Figure BDA00024749909700001512
In the bucket.
Further, in one implementation of the present embodiment, upon receiving an insert operation, it may be determined whether the number of buckets, m, is a power of 2. If m is a power of 2, the tag cuckoo filter degenerates to an existing cuckoo filter, and if m is not a power of 2, the process of performing the insert operation according to the tag bird filter is performed. When the tag cuckoo filter is degraded into the existing cuckoo filter (i.e. when the number of storage buckets is a power of 2), as shown in fig. 6, the insertion operation may be implemented as follows:
b10 calculating the fingerprint f of the element xxAnd a first candidate bucket index h0(x) (ii) a Computing the fingerprint f of an element x using a hash function G (x)xAnd candidate bucket index h0(x) The calculation formula is as follows:
h0(x):fx=G(x) (9)
wherein, the lower value of the hash value G (x) is fxHigh numerical value of h0(x) And the term' is a numerical value connector. First candidate bucket index h0(x) Is in the range of [ 0., M-1]]Where M is a power of 2 and M is equal to M.
B20 calculating a second candidate bucket index h of element x1(x) In that respect Element x based fingerprint fxAnd a first candidate bucket index h0(x) Computing a second candidate bucket index h using an XOR operation1(x) The calculation formula is as follows:
Figure BDA0002474990970000161
wherein the hash value H (r)x) Is in the range of [ 0., M-1]],
Figure BDA0002474990970000162
Is an XOR operator, "mod" is a modulo operator, and the second candidate bucket index h1(x) Is in the range of [ 0., M-1]]Where M is a power of 2 and M is equal to M.
B30, judging two candidate buckets h0(x) And h1(x) Whether at least one contains a free storage location; searching each candidate bucket h0(x) Or h1(x) Checking whether a free storage location is included; if at least one candidate bucket contains a free storage location, proceed to step B40; otherwise, step B50 is entered.
B40 storing fingerprint f of element xxIn one of the free candidate buckets h0(x) Or h1(x) In (3), the element x insertion ends.
B50, randomly selecting a candidate bucket h0(x) Or h1(x) Setting a new fingerprint f as the fingerprint fx(i.e., f ═ f)x). Setting target candidate bucket h to h0(x) Or h1(x) And the number of element moves is set to 0.
B60, increasing the element moving times by 1, and judging whether the element moving times exceed 500; if the number of times exceeds 500, entering step B70; otherwise, step B80 is entered.
B70, element x insertion fails, and element insertion ends.
B80, randomly selecting a storage location of the target candidate bucket h, moving the target fingerprint g of the storage location, and storing the new fingerprint f in the storage location.
B90, calculating another reference candidate bucket index of the target fingerprint g, and setting the reference candidate bucket index as the target candidate bucket h. Based on the target fingerprint g and the target candidate bucket index i (i.e. i ═ h), calculating a reference selected bucket index j by using an exclusive-or operation, wherein the calculation formula is as follows:
Figure BDA0002474990970000163
wherein the hash value H (g) ranges from [ 0., M-1 &],
Figure BDA0002474990970000164
Is an exclusive or operator, "mod" is a modulo operator; the index j of the reference bucket is set to the target candidate bucket index h (i.e., h ═ j).
B100, judging whether the target candidate bucket h comprises at least one free storage position. If at least one free storage position is included, entering step B110; otherwise, step B120 is entered.
B110, storing the target fingerprint g in the target candidate bucket h, and ending element insertion.
B120, jumping to step B60, recursively removing other old fingerprints and inserting new fingerprints until the number of movements exceeds 500.
In an implementation manner of this embodiment, the data member management operation is a query operation; as shown in fig. 7, when the tagger filter receives a data member management operation, determining two candidate buckets and two tag fingerprints corresponding to preset data members based on an exclusive or operation, and executing the data member management operation based on the two candidate buckets and the two tag fingerprints specifically includes:
c10, determining a label fingerprint remainder corresponding to the preset data member and a first candidate bucket;
c20, determining a second candidate bucket corresponding to the preset data member by adopting an exclusive OR operation based on the label fingerprint remainder and the first candidate bucket;
c30, determining a first label corresponding to the preset data member according to the first candidate bucket, and determining a second label corresponding to the preset data member according to the second candidate bucket;
c40, determining a first label fingerprint corresponding to a preset data member according to the first label and the label fingerprint remainder, and determining a second label fingerprint corresponding to the preset data member according to the second label and the label fingerprint remainder;
c50, correcting the first candidate bucket and the second candidate bucket according to the number of the storage buckets, and taking the corrected first candidate bucket and the corrected second candidate bucket as a first candidate bucket and a second storage candidate bucket corresponding to preset data members;
c60, searching a first label fingerprint and a second label fingerprint corresponding to the preset data member in the first candidate bucket and the second candidate bucket respectively;
c70, if the first label fingerprint or the second label fingerprint corresponding to the preset data member is found, prompting the data member to successfully inquire;
and C80, if the first label fingerprint or the second label fingerprint corresponding to the preset data member is not found, prompting that the data member fails to inquire.
Specifically, the execution process of steps C10-C50 is the same as the execution process of steps a10-a50, and is not repeated here, and the description of steps a10-a50 may be referred to specifically.
Further, when the first candidate bucket h is obtained0(x) And a second candidate bucket h1(x) Thereafter, the first candidate bucket h is searched0(x) Whether the first tag fingerprint of the pre-data member x is matched
Figure BDA0002474990970000181
And a second candidate bucket h1(x) Whether the second tag fingerprint of the pre-data member x is matched
Figure BDA0002474990970000182
If matching the first tag fingerprint
Figure BDA0002474990970000183
Or second tag fingerprint
Figure BDA0002474990970000184
Indicating that the preset data member element x is in the set, returning a query result to be True (True), and finishing the query of the preset data member; if not storing a matching first tag fingerprint
Figure BDA0002474990970000185
Or second tag fingerprint
Figure BDA0002474990970000186
And indicating that the preset data member element x is not in the set, returning a query result of False (False), and ending the element query.
An element query method of adding and subtracting a cuckoo filter is illustrated as follows:
as shown in FIG. 8, when querying element x, first, two candidate bucket indices h for element x are calculated using equations (1) and (2)0(x) 6 and h1(x) 4; secondly, two candidate buckets h are calculated by using the formulas (3), (4) and (5)0(x) And h1(x) Corresponding two tag fingerprints
Figure BDA0002474990970000187
And
Figure BDA0002474990970000188
adjusting two candidate bucket indices to h simultaneously0(x) 1 and h1(x) 4; next, two candidate buckets h are searched0(x) 1 and h1(x) Whether 4 matches the corresponding two tag fingerprints
Figure BDA0002474990970000189
And
Figure BDA00024749909700001810
finding candidate buckets h0(x) Tag fingerprint with match 1
Figure BDA00024749909700001811
Finally, the query indicates that element x is in the set, returning the query result as true.
In one implementation of the present embodiment, upon receiving an insert operation, it may be determined whether the number of buckets, m, is a power of 2. If m is a power of 2, the tag cuckoo filter degenerates to an existing cuckoo filter, and if m is not a power of 2, then the process of performing a query operation according to the tag bird filter is performed. When the tag cuckoo filter is degraded into the existing cuckoo filter, as shown in fig. 7, the query operation may be implemented by:
d10 calculating the fingerprint f of the element xxAnd a first candidate bucket index h0(x) In that respect Calculating the fingerprint f of the element x using equation (9)xAnd candidate bucket index h0(x) Wherein h is0(x) Is in the range of [ 0., M-1]]Where M is a power of 2 and M is equal to M.
D20 calculating a second candidate bucket index h for element x1(x) In that respect Calculating a second candidate bucket index h using equation (10)1(x) Wherein h is1(x) Is in the range of [ 0., M-1]]M is a power of 2 and M is equal to M.
D30 searching two candidate buckets h0(x) And h1(x) Whether or not to match a fingerprint fx(ii) a If matching the fingerprint fxProceeding to step D40; otherwise, step D50 is entered.
D40, indicating that the element x is in the set, returning the query result as true, and ending the element query.
D50, indicating that the element x is not in the set, returning the query result as false, and ending the element query.
In one implementation manner of this embodiment, the data member management operation is a delete operation; as shown in fig. 9, when the tagger filter receives a data member management operation, determining two candidate buckets and two tag fingerprints corresponding to preset data members based on an exclusive or operation, and executing the data member management operation based on the two candidate buckets and the two tag fingerprints specifically includes:
e10, determining a label fingerprint remainder corresponding to the preset data member and a first candidate storage bucket;
e20, determining a second candidate bucket corresponding to the preset data member by adopting an exclusive OR operation based on the label fingerprint remainder and the first candidate bucket;
e30, determining a first label corresponding to the preset data member according to the first candidate bucket, and determining a second label corresponding to the preset data member according to the second candidate bucket;
e40, determining a first label fingerprint corresponding to a preset data member according to the first label and the label fingerprint remainder, and determining a second label fingerprint corresponding to the preset data member according to the second label and the label fingerprint remainder;
e50, correcting the first candidate bucket and the second candidate bucket according to the number of the storage buckets, and taking the corrected first candidate bucket and the corrected second candidate bucket as a first candidate bucket and a second candidate bucket corresponding to a preset data member;
e60, respectively searching a first label fingerprint and a second label fingerprint corresponding to the preset data member in the first candidate bucket and the second candidate bucket;
e70, if the first label fingerprint or the second label fingerprint corresponding to the preset data member is found, deleting the label fingerprint corresponding to the found preset data member, wherein the label fingerprint is the first label fingerprint or the second label fingerprint;
e80, if the first label fingerprint or the second label fingerprint corresponding to the preset data member is not found, prompting that the deletion of the data member fails.
Specifically, the execution process of steps E10-E50 is the same as the execution process of steps a10-a50, and is not repeated here, and the description of steps a10-a50 may be referred to specifically.
Further, when the first candidate bucket h is obtained0(x) And a second candidate bucket h1(x) Thereafter, the first candidate bucket h is searched0(x) Whether the first tag fingerprint of the pre-data member x is matched
Figure BDA0002474990970000201
And a second candidate bucket h1(x) Whether the second tag fingerprint of the pre-data member x is matched
Figure BDA0002474990970000202
If matching the first tag fingerprint
Figure BDA0002474990970000203
Or second tag fingerprint
Figure BDA0002474990970000204
Deleting the label fingerprint corresponding to the searched preset data member, and finishing the deletion of the data member; if not storing a matching first tag fingerprint
Figure BDA0002474990970000205
Or second tag fingerprint
Figure BDA0002474990970000206
The table deletes the data member unsuccessfully and the element deletion ends.
The element deletion method of the plus-minus cuckoo filter is illustrated as follows:
as shown in FIG. 10, when an element y is deleted, first, two candidate bucket indices h for the element y are calculated using equations (1) and (2)0(y) 3 and h1(y) 6; secondly, two candidate buckets h are calculated by using the formulas (3), (4) and (5)0(y) and h1(y) two corresponding tag fingerprints
Figure BDA0002474990970000207
And
Figure BDA0002474990970000208
adjusting two candidate bucket indices to h simultaneously0(y) 3 and h1(y) 1; next, two candidate buckets h are searched0(y) 3 and h1(y) whether 1 matches the corresponding two tag fingerprints
Figure BDA0002474990970000209
And
Figure BDA00024749909700002010
assume two tag fingerprints
Figure BDA00024749909700002011
And
Figure BDA00024749909700002012
equal (i.e. equal)
Figure BDA00024749909700002013
) Find candidate bucket h0(y) 3 and h1(y) there are matching tag fingerprints for 1
Figure BDA00024749909700002014
And
Figure BDA00024749909700002015
finally, a candidate bucket h is randomly selected1(y) 1, deleting a matching tag fingerprint from the bucket
Figure BDA00024749909700002016
Deletion of element y was successful.
Further, the element deletion method of the tag cuckoo filter can ensure that the elements are correctly deletedFalse negative errors will not occur. If the fingerprints of two inserted elements are the same, the plus-minus cuckoo filter inserts the two fingerprints of the two elements into the filter. If one of the two elements is deleted, the other is still in the filter and therefore no false negative errors and possibly false positive errors occur. For example, in FIG. 10, element g is queried after element y is deleted, due to the tag fingerprint of element g
Figure BDA00024749909700002017
And
Figure BDA00024749909700002018
tag fingerprint with element y
Figure BDA00024749909700002019
And
Figure BDA00024749909700002020
equal (i.e. equal)
Figure BDA00024749909700002021
And is
Figure BDA00024749909700002022
) Finding a tag fingerprint
Figure BDA00024749909700002023
In candidate bucket h0(g) In 3, the query indicates that element g is in the set, returning the query result as true. When querying element y, fingerprint due to tag
Figure BDA00024749909700002024
(i.e. the
Figure BDA00024749909700002025
) Bucket h is still candidate0(y) 3, the query indicates that element y is in the set, and the returned query result is true; however, since element y was successfully deleted, the query results in a false positive error. Nonetheless, the element deletion side of the tag cuckoo filterThe method does not increase the false positive error rate, has low false positive error rate, and is the same as the false positive error rate of a standard bloom filter, a counting bloom filter, a cuckoo filter and the like.
In one implementation of the present embodiment, upon receiving an insert operation, it may be determined whether the number of buckets, m, is a power of 2. If m is a power of 2, the tag cuckoo filter degenerates to an existing cuckoo filter, and if m is not a power of 2, the process of performing the delete operation according to the tag bird filter is performed. When the tag cuckoo filter is degraded into the existing cuckoo filter, as shown in fig. 9, the deleting operation may be implemented by:
f10 calculating the fingerprint F of the element xxAnd a first candidate bucket index h0(x) In that respect Calculating the fingerprint f of the element x using equation (9)xAnd candidate bucket index h0(x) Wherein h is0(x) In the range of h1(x) Is in the range of [ 0., M-1]]Where M is a power of 2 and M is equal to M
F20 calculating a second candidate bucket index h for element x1(x) In that respect Calculating a second candidate bucket index h using equation (10)1(x) Wherein h is1(x) In the range of h1(x) Is in the range of [ 0., M-1]]M is a power of 2 and M is equal to M.
F30 searching two candidate buckets h0(x) And h1(x) Whether or not to match a fingerprint fx. If matching the fingerprint fxGo to step DF 0; otherwise, step F50 is entered.
F40, if the element x is in the set, deleting the label fingerprint F corresponding to the found preset data memberxAnd the element deletion ends.
F50, indicating that the element x is not in the set, returning the deletion result as false, and ending the element deletion.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A tag cuckoo filter is characterized by comprising a cuckoo hash table, wherein the cuckoo hash table comprises a plurality of storage buckets, each data member corresponds to two tag fingerprints, and the two tag fingerprints are stored in the two storage buckets respectively; when the tag cuckoo filter receives a data member management operation, determining two candidate buckets and two tag fingerprints corresponding to a preset data member based on an exclusive-or operation, and executing the data member management operation based on the two candidate buckets and the two tag fingerprints, wherein the preset data member is a data member corresponding to the data member management operation.
2. The tagger-cuckoo filter of claim 1, wherein the tag fingerprints include tags and tag fingerprint remainders, and the tag fingerprint remainders of two corresponding tag fingerprints of the predetermined data members are the same.
3. The tag cuckoo filter of claim 2, wherein the determining two candidate buckets and two tag fingerprints corresponding to preset data members based on an exclusive or operation specifically comprises:
determining a label fingerprint remainder corresponding to the preset data member and a first candidate bucket;
determining a second candidate bucket corresponding to the preset data member by adopting an exclusive-or operation based on the tag fingerprint remainder and the first candidate bucket;
determining a first label corresponding to a preset data member according to the first candidate bucket, and determining a second label corresponding to the preset data member according to the second candidate bucket;
determining a first label fingerprint corresponding to a preset data member according to the first label and the label fingerprint remainder, and determining a second label fingerprint corresponding to the preset data member according to the second label and the label fingerprint remainder;
and respectively correcting the first candidate bucket and the second candidate bucket according to the number of the storage buckets, and taking the corrected first candidate bucket and the corrected second candidate bucket as a first candidate bucket and a second storage candidate bucket corresponding to a preset data member.
4. The tagger blackbird filter of claim 3, wherein, when the data membership management operation is an insert operation, the performing the data membership management operation based on the two candidate buckets and the two tag fingerprints includes:
detecting whether there are free storage locations in the first candidate bucket and the second candidate bucket;
and if the idle storage position exists, storing the candidate tag fingerprint of the preset data member in the idle storage position, wherein the candidate tag fingerprint is a fingerprint tag corresponding to the candidate storage bucket to which the idle storage position belongs.
5. The tagger aventucko filter of claim 4, wherein when the data membership management operation is an insert operation, the performing the data membership management operation based on the determined two candidate buckets and the two tag fingerprints comprises:
if no idle storage position exists, selecting a target candidate storage bucket from the first candidate storage bucket and the second candidate storage bucket, and taking the label fingerprint corresponding to the target candidate storage bucket as the label fingerprint corresponding to a preset data member;
selecting a target label fingerprint in the target candidate storage bucket, and storing the label fingerprint corresponding to the preset data member in a storage position corresponding to the target label fingerprint;
determining a reference storage bucket and a reference label fingerprint corresponding to the target label fingerprint by adopting an exclusive OR operation according to the target candidate storage bucket and the target label fingerprint;
if the reference bucket has a free storage location, the reference tag fingerprint is stored in the free storage location.
6. The tagger aventucko filter of claim 5, wherein when the data membership management operation is an insert operation, the performing the data membership management operation based on the determined two candidate buckets and the two tag fingerprints comprises:
if the reference storage bucket does not have an idle storage position, selecting a target storage position in the reference storage bucket, and storing the reference label fingerprint in the target storage position;
taking the label fingerprint corresponding to the target storage position as a target label fingerprint, and taking the reference storage bucket as a target candidate storage bucket;
and continuing to execute the step of determining the reference storage bucket and the reference label fingerprint corresponding to the target label fingerprint by adopting an exclusive OR operation according to the target candidate storage bucket and the target label fingerprint until the reference storage bucket has an idle storage position or the execution times reaches a preset time threshold.
7. The tagger filter of claim 5 or 6, wherein the determining, according to the target candidate bucket and the target tag fingerprint, the reference bucket and the reference tag fingerprint corresponding to the target tag fingerprint by using an exclusive-or operation specifically comprises:
obtaining a target tag of the target tag fingerprint and updating the target candidate bucket based on the target tag;
determining a comparison storage bucket corresponding to the target label fingerprint by adopting exclusive OR operation based on the updated target candidate storage bucket and the label fingerprint remainder of the target label fingerprint;
and determining a reference label fingerprint corresponding to the target label fingerprint based on the comparison storage bucket and the label fingerprint remainder of the target label fingerprint, and correcting the comparison storage bucket based on the number of the storage buckets to obtain the reference storage bucket corresponding to the target label fingerprint.
8. The tagger-brook filter of claim 3, wherein, when the data membership management operation is a query operation or a delete operation, the performing the data membership management operation based on the two candidate buckets and the two tag fingerprints specifically comprises:
searching a first label fingerprint and a second label fingerprint corresponding to the preset data member in the first candidate bucket and the second candidate bucket respectively;
and if the first label fingerprint or the second label fingerprint corresponding to the preset data member is found, executing the data member management operation on the fingerprint.
9. The tagged cuckoo filter of claim 8, wherein performing the data membership management operation on the fingerprint specifically comprises:
when the data member management operation is a query operation, prompting the data member to successfully query;
and when the data member management operation is a deletion operation, deleting the label fingerprint corresponding to the found preset data member, wherein the label fingerprint is a first label fingerprint or a second label fingerprint.
10. The tagger-brook filter of claim 8, wherein, when the data membership management operation is a query operation or a delete operation, the performing the data membership management operation based on the two candidate buckets and the two tag fingerprints includes:
and if the first label fingerprint or the second label fingerprint corresponding to the preset data member is not found, prompting that the management operation of the data member fails.
CN202010360757.7A 2020-04-30 2020-04-30 Tag cuckoo filter Active CN111552693B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010360757.7A CN111552693B (en) 2020-04-30 2020-04-30 Tag cuckoo filter

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010360757.7A CN111552693B (en) 2020-04-30 2020-04-30 Tag cuckoo filter

Publications (2)

Publication Number Publication Date
CN111552693A true CN111552693A (en) 2020-08-18
CN111552693B CN111552693B (en) 2023-04-07

Family

ID=72003379

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010360757.7A Active CN111552693B (en) 2020-04-30 2020-04-30 Tag cuckoo filter

Country Status (1)

Country Link
CN (1) CN111552693B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112148928A (en) * 2020-09-18 2020-12-29 鹏城实验室 Cuckoo filter based on fingerprint family
CN112632337A (en) * 2020-12-28 2021-04-09 南方科技大学 Element management method applied to firework filter and firework filter
CN113360516A (en) * 2021-08-11 2021-09-07 成都信息工程大学 Set member management method based on first-in first-out and minimum active number strategy
CN113535706A (en) * 2021-08-03 2021-10-22 重庆赛渝深科技有限公司 Two-stage cuckoo filter and repeated data deleting method based on two-stage cuckoo filter
CN113641681A (en) * 2021-10-13 2021-11-12 南京大数据集团有限公司 Space self-adaptive mass data query method
CN116467307A (en) * 2023-03-29 2023-07-21 济南大学 Design method and system for cuckoo filter for reducing false positive rate

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101655861A (en) * 2009-09-08 2010-02-24 中国科学院计算技术研究所 Hashing method based on double-counting bloom filter and hashing device
CN105630955A (en) * 2015-12-24 2016-06-01 华中科技大学 Method for efficiently managing members of dynamic data set
CN107515901A (en) * 2017-07-24 2017-12-26 中国科学院信息工程研究所 A kind of chain type daily record storage organization and its Hash Index Structure, data manipulation method and server, medium
CN109815234A (en) * 2018-12-29 2019-05-28 杭州中科先进技术研究院有限公司 A kind of multiple cuckoo filter under streaming computing model
CN110046164A (en) * 2019-04-16 2019-07-23 中国人民解放军国防科技大学 Index independent grain distribution filter, consistency grain distribution filter and operation method
CN110222088A (en) * 2019-05-20 2019-09-10 华中科技大学 Data approximation set representation method and system based on insertion position selection

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101655861A (en) * 2009-09-08 2010-02-24 中国科学院计算技术研究所 Hashing method based on double-counting bloom filter and hashing device
CN105630955A (en) * 2015-12-24 2016-06-01 华中科技大学 Method for efficiently managing members of dynamic data set
CN107515901A (en) * 2017-07-24 2017-12-26 中国科学院信息工程研究所 A kind of chain type daily record storage organization and its Hash Index Structure, data manipulation method and server, medium
CN109815234A (en) * 2018-12-29 2019-05-28 杭州中科先进技术研究院有限公司 A kind of multiple cuckoo filter under streaming computing model
CN110046164A (en) * 2019-04-16 2019-07-23 中国人民解放军国防科技大学 Index independent grain distribution filter, consistency grain distribution filter and operation method
CN110222088A (en) * 2019-05-20 2019-09-10 华中科技大学 Data approximation set representation method and system based on insertion position selection

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112148928A (en) * 2020-09-18 2020-12-29 鹏城实验室 Cuckoo filter based on fingerprint family
CN112148928B (en) * 2020-09-18 2024-02-20 鹏城实验室 Cuckoo filter based on fingerprint family
CN112632337A (en) * 2020-12-28 2021-04-09 南方科技大学 Element management method applied to firework filter and firework filter
CN112632337B (en) * 2020-12-28 2023-12-22 南方科技大学 Element management method applied to firework filter and firework filter
CN113535706A (en) * 2021-08-03 2021-10-22 重庆赛渝深科技有限公司 Two-stage cuckoo filter and repeated data deleting method based on two-stage cuckoo filter
CN113535706B (en) * 2021-08-03 2023-05-23 佛山赛思禅科技有限公司 Two-stage cuckoo filter and repeated data deleting method based on two-stage cuckoo filter
CN113360516A (en) * 2021-08-11 2021-09-07 成都信息工程大学 Set member management method based on first-in first-out and minimum active number strategy
CN113360516B (en) * 2021-08-11 2021-11-26 成都信息工程大学 Collection member management method
CN113641681A (en) * 2021-10-13 2021-11-12 南京大数据集团有限公司 Space self-adaptive mass data query method
CN116467307A (en) * 2023-03-29 2023-07-21 济南大学 Design method and system for cuckoo filter for reducing false positive rate
CN116467307B (en) * 2023-03-29 2024-02-23 济南大学 Design method and system for cuckoo filter for reducing false positive rate

Also Published As

Publication number Publication date
CN111552693B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN111552693B (en) Tag cuckoo filter
CN111552692B (en) Plus-minus cuckoo filter
CN112148928B (en) Cuckoo filter based on fingerprint family
CN108446407B (en) Database auditing method and device based on block chain
US10013312B2 (en) Method and system for a safe archiving of data
WO2010135082A1 (en) Localized weak bit assignment
US8190591B2 (en) Bit string searching apparatus, searching method, and program
US20130151562A1 (en) Method of calculating feature-amount of digital sequence, and apparatus for calculating feature-amount of digital sequence
CN111247518A (en) Database sharding
US11777983B2 (en) Systems and methods for rapidly generating security ratings
CN110245028B (en) Message storage method, device, computer equipment and storage medium of IoT-MQ
CN109189759B (en) Data reading method, data query method, device and equipment in KV storage system
CN113867627B (en) Storage system performance optimization method and system
EP3522040A1 (en) Method and device for file storage
CN111291002A (en) File account checking method and device, computer equipment and storage medium
CN113392089B (en) Database index optimization method and readable storage medium
CN105183383A (en) Recombination method for irrelevant mirror images of file system
CN109344163B (en) Data verification method and device and computer readable medium
WO2011137684A1 (en) Search method and device based on information records of embedded system
WO2011073680A1 (en) Improvements relating to hash tables
CN111858606A (en) Data processing method and device and electronic equipment
CN115391355A (en) Data processing method, device, equipment and storage medium
CN112632337B (en) Element management method applied to firework filter and firework filter
CN102591941B (en) Analysis method and analysis device for SQLite idle struct nodes
CN110413617B (en) Method for dynamically adjusting hash table group according to size of data volume

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant