CN111552693B - Tag cuckoo filter - Google Patents

Tag cuckoo filter Download PDF

Info

Publication number
CN111552693B
CN111552693B CN202010360757.7A CN202010360757A CN111552693B CN 111552693 B CN111552693 B CN 111552693B CN 202010360757 A CN202010360757 A CN 202010360757A CN 111552693 B CN111552693 B CN 111552693B
Authority
CN
China
Prior art keywords
candidate
fingerprint
tag
bucket
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010360757.7A
Other languages
Chinese (zh)
Other versions
CN111552693A (en
Inventor
黄昆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest University of Science and Technology
Peng Cheng Laboratory
Original Assignee
Southwest University of Science and Technology
Peng Cheng Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest University of Science and Technology, Peng Cheng Laboratory filed Critical Southwest University of Science and Technology
Priority to CN202010360757.7A priority Critical patent/CN111552693B/en
Publication of CN111552693A publication Critical patent/CN111552693A/en
Application granted granted Critical
Publication of CN111552693B publication Critical patent/CN111552693B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/2443Stored procedures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Collating Specific Patterns (AREA)

Abstract

The invention discloses a tag cuckoo filter, which comprises a cuckoo hash table, wherein the cuckoo hash table comprises a plurality of storage barrels, each data member corresponds to two tag fingerprints, and the two tag fingerprints are respectively stored in the two storage barrels; when the tag cuckoo filter receives a data member management operation, determining two candidate buckets and two tag fingerprints corresponding to preset data members based on an exclusive-or operation, and executing the data member management operation based on the two candidate buckets and the two tag fingerprints obtained through determination. The invention respectively configures two label fingerprints and two storage buckets for each data member, determines the candidate storage buckets corresponding to the data members by adopting XOR operation based on the label fingerprints, and does not require that the number of the storage buckets is required to be the power of 2, thereby reducing the storage space overhead of each data member.

Description

Tag cuckoo filter
Technical Field
The invention relates to the technical field of computer information representation and information retrieval, in particular to a tag cuckoo filter.
Background
Member Membership Query (Membership Query) is one of key methods for many network applications and distributed systems (such as cooperative caching, packet processing, key value storage and data de-duplication), and three key requirements of low storage space overhead, fast Query and incremental update are required to be met. Currently, member membership query generally adopts Bloom filters (Bloom filters), standard Bloom filters (Standard Bloom filters), counting Bloom filters (Counting Bloom filters), cuckoo filters (Cuckoo filters) and the like, but the Bloom filters (Bloom filters) and variants thereof cannot simultaneously meet the three key requirements. For example, a standard bloom filter supports element insertion and query operations, but does not support element deletion operations. Counting bloom filters are one type of bloom filter that supports delete operations, but their storage space overhead is high. The brook bird filter is a space-efficient broomm filter supporting deletion operations, and the storage space overhead of counting the broomm filter is obviously reduced and even lower than that of a standard broomm filter. However, the existing cuckoo filter has storage space for each data memberThe problem of dynamic variation of the pins with the number of elements is due to the fact that the xor operation of the cuckoo filter requires that the number of buckets to be stored must be a power of 2 (i.e., 2) b B is exponential) resulting in a 2-fold increase in worst-case memory overhead per data member.
Thus, the prior art has yet to be improved and enhanced.
Disclosure of Invention
The invention aims to solve the technical problem of providing a tag cuckoo filter aiming at the defects of the prior art.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
a tag cuckoo filter comprises a cuckoo hash table, wherein the cuckoo hash table comprises a plurality of storage buckets, each data member corresponds to two tag fingerprints, and the two tag fingerprints are stored in the two storage buckets respectively; when the tag cuckoo filter receives a data member management operation, determining two candidate buckets and two tag fingerprints corresponding to a preset data member based on an exclusive-or operation, and executing the data member management operation based on the two candidate buckets and the two tag fingerprints, wherein the preset data member is a data member corresponding to the data member management operation.
The tag cuckoo filter is characterized in that the tag fingerprints comprise tags and tag fingerprint remainders, and the tag fingerprint remainders of the two tag fingerprints corresponding to the preset data members are the same.
The tag cuckoo filter, wherein, the determining of two candidate buckets and two tag fingerprints corresponding to preset data members based on an exclusive or operation specifically includes:
determining a label fingerprint remainder corresponding to the preset data member and a first candidate bucket;
determining a second candidate bucket corresponding to the preset data member by adopting an exclusive-or operation based on the tag fingerprint remainder and the first candidate bucket;
determining a first label corresponding to a preset data member according to the first candidate bucket, and determining a second label corresponding to the preset data member according to the second candidate bucket;
determining a first label fingerprint corresponding to a preset data member according to the first label and the label fingerprint remainder, and determining a second label fingerprint corresponding to the preset data member according to the second label and the label fingerprint remainder;
and respectively correcting the first candidate bucket and the second candidate bucket according to the number of the storage buckets, and taking the corrected first candidate bucket and the corrected second candidate bucket as a first candidate bucket and a second storage candidate bucket corresponding to a preset data member.
The tag cuckoo filter, wherein, when the data member management operation is an insert operation, the performing the data member management operation based on the two candidate buckets and the two tag fingerprints specifically includes:
detecting whether a free storage location exists in the first candidate bucket and the second candidate bucket;
and if the idle storage position exists, storing the candidate tag fingerprint of the preset data member in the idle storage position, wherein the candidate tag fingerprint is a fingerprint tag corresponding to the candidate storage bucket to which the idle storage position belongs.
The tag cuckoo filter, wherein, when the data membership management operation is an insert operation, the performing the data membership management operation based on the two candidate buckets and the two tag fingerprints includes:
if no idle storage position exists, selecting a target candidate storage bucket from the first candidate storage bucket and the second candidate storage bucket, and taking the label fingerprint corresponding to the target candidate storage bucket as the label fingerprint corresponding to a preset data member;
selecting a target label fingerprint in the target candidate storage bucket, and storing the label fingerprint corresponding to the preset data member in a storage position corresponding to the target label fingerprint;
determining a reference storage bucket and a reference label fingerprint corresponding to the target label fingerprint by adopting an exclusive OR operation according to the target candidate storage bucket and the target label fingerprint;
if the reference bucket has a free storage location, the reference tag fingerprint is stored in the free storage location.
The tag cuckoo filter, wherein, when the data membership management operation is an insert operation, the performing the data membership management operation based on the two candidate buckets and the two tag fingerprints includes:
if the reference storage bucket does not have an idle storage position, selecting a target storage position in the reference storage bucket, and storing the reference label fingerprint in the target storage position;
taking the label fingerprint corresponding to the target storage position as a target label fingerprint, and taking the reference storage bucket as a target candidate storage bucket;
and continuing to execute the step of determining the reference storage bucket and the reference label fingerprint corresponding to the target label fingerprint by adopting an exclusive OR operation according to the target candidate storage bucket and the target label fingerprint until the reference storage bucket has an idle storage position or the execution times reaches a preset time threshold.
The tag cuckoo filter, wherein the determining, according to the target candidate bucket and the target tag fingerprint, a reference bucket and a reference tag fingerprint corresponding to the target tag fingerprint by using an exclusive or operation specifically includes:
obtaining a target tag of the target tag fingerprint and updating the target candidate bucket based on the target tag;
determining a comparison storage bucket corresponding to the target label fingerprint by adopting exclusive OR operation based on the updated target candidate storage bucket and the label fingerprint remainder of the target label fingerprint;
and determining a reference label fingerprint corresponding to the target label fingerprint based on the comparison storage bucket and the label fingerprint remainder of the target label fingerprint, and correcting the comparison storage bucket based on the number of the storage buckets to obtain the reference storage bucket corresponding to the target label fingerprint.
The tag cuckoo filter, wherein, when the data member management operation is an inquiry operation or a deletion operation, the executing the data member management operation based on the two candidate buckets and the two tag fingerprints specifically includes:
searching a first label fingerprint and a second label fingerprint corresponding to the preset data member in the first candidate bucket and the second candidate bucket respectively;
and if the first label fingerprint or the second label fingerprint corresponding to the preset data member is found, executing the data member management operation on the fingerprint.
The tag cuckoo filter, wherein the performing the data member management operation on the fingerprint specifically includes:
when the data member management operation is a query operation, prompting the data member to successfully query;
and when the data member management operation is a deleting operation, deleting the label fingerprint corresponding to the searched preset data member, wherein the label fingerprint is a first label fingerprint or a second label fingerprint.
The tag cuckoo filter, wherein, when the data member management operation is an inquiry operation or a deletion operation, the executing the data member management operation based on the two candidate buckets and the two tag fingerprints obtained by the determining specifically includes:
and if the first label fingerprint or the second label fingerprint corresponding to the preset data member is not found, prompting that the management operation of the data member fails.
Has the advantages that: compared with the prior art, the invention provides a tag cuckoo filter, which comprises a cuckoo hash table, wherein the cuckoo hash table comprises storage barrels for storing the number of the storage barrels, each data member corresponds to two tag fingerprints, and the two tag fingerprints are respectively stored in the two storage barrels; when the tag cuckoo filter receives a data member management operation, determining two candidate buckets and two tag fingerprints corresponding to preset data members based on an exclusive-or operation, and executing the data member management operation based on the two candidate buckets and the two tag fingerprints obtained through determination. According to the method, when two label fingerprints and two storage buckets are allocated for each data member operation, the candidate storage buckets corresponding to the data members are determined by the XOR operation based on the label fingerprints, and the number of the storage buckets is not required to be the power of 2, so that the storage space overhead of each data member is reduced.
Drawings
Fig. 1 is an example of a tag cuckoo filter provided by the present invention.
Fig. 2 is a format diagram of the identification fingerprint provided by the present invention.
FIG. 3 is a flow diagram of inserting data members in a tag cuckoo filter provided by the present invention when the number of storage buckets is not a power of 2.
FIG. 4 is an example of inserting data members in a tag cuckoo filter provided by the present invention.
FIG. 5 is another example of inserting data members in a tag cuckoo filter provided by the present invention.
Fig. 6 is a flow chart of inserting data members in the tag cuckoo filter provided by the present invention when the number of storage buckets is a power of 2.
Fig. 7 is a flowchart of searching for data members in the tag cuckoo filter provided by the present invention.
Fig. 8 is an example of searching for data members in a tag cuckoo filter provided by the present invention.
Fig. 9 is a flow chart of deleting data members in the tag cuckoo filter provided by the present invention.
Fig. 10 is an example of deleting data members in a tag cuckoo filter provided by the present invention.
Detailed Description
The invention provides a tag cuckoo filter, which is further described in detail below by referring to the accompanying drawings and embodiments in order to make the purpose, technical scheme and effect of the invention clearer and more clear. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The inventor finds that member Membership Query (Membership Query) is one of key methods for many network applications and distributed systems (such as cooperative caching, packet processing, key value storage and data de-duplication), and three key requirements of low storage space overhead, fast Query and incremental update are required to be met. Currently, a Bloom Filter (Bloom Filter), a Standard Bloom Filter (Standard Bloom Filter), a Counting Bloom Filter (Counting Bloom Filter), a Cuckoo Filter (Cuckoo Filter), and the like are commonly used for member membership query.
A standard bloom filter represents a set of m elements (items) in n bits (i.e., a bitmap), i.e., each inserted element is mapped to k bits of a bitmap using k hash functions, the k bit values being set to 1. Mapping each inquired element to k bits of a bitmap by adopting the same k hash functions, and checking whether the k bit values are all 1; if all are 1, indicating that the element is in the set; otherwise, it indicates that the element is not in the set. The standard bloom filter is a spatially efficient randomized data structure with a low False Positive error Rate (i.e., the query result indicates that an element is in the set but the element is not actually in the set) for the query, but does not produce False negative errors (i.e., the element must not be in the set if the query result indicates that the element is not in the set). Standard bloom filters support element insertion and query operations, but do not support element deletion operations.
A counting bloom filter is a bloom filter that supports delete operations, i.e., n counters (counters) are used to represent m elements of a collection. When the element is inserted, mapping the element to k counters by adopting k hash functions, wherein the k counters are increased by 1; when an element is deleted, the k counter values are decremented by 1. When the elements are inquired, the same k Hash functions are adopted to map the elements to k counters, and whether the values of the k counters are all larger than 1 is checked; if all are greater than 1, indicating that the element is in the set; otherwise, it indicates that the element is not in the set. In practical application, the counter size is set to 4 bits, so that the counter overflow problem can be avoided. Therefore, counting bloom filters support fast incremental updates, but their memory overhead is high, 4 times that of standard bloom filters.
The brook bird filter is a space-efficient broomm filter supporting deletion operations, and the storage space overhead of counting the broomm filter is obviously reduced and even lower than that of a standard broomm filter. The Cuckoo filter is calculated by using a Cuckoo Hash Table (Cuckoo Hash Table) and candidate Bucket index values based on exclusive-or (XOR), that is, one Fingerprint (Fingerprint) of each element is inserted or deleted or queried in two candidate buckets (buckets) of Hash mapping of each element, but the element is not originally calculatedAnd (4) self-cleaning. However, the cuckoo filter has a problem that the storage space overhead of each data member dynamically varies with the number of elements because the exclusive-or operation of the cuckoo filter requires that the number of storage buckets must be a power of 2 (i.e., 2) b B is exponential) resulting in a 2-fold increase in worst-case memory overhead per data member.
From the above, the filter adopted in the present membership query cannot satisfy the requirement of membership query. Therefore, in the embodiment of the invention, the tag cuckoo filter comprises a cuckoo hash table, wherein the cuckoo hash table comprises a plurality of storage buckets, each data member corresponds to two tag fingerprints, and the two tag fingerprints are respectively stored in the two storage buckets; when the tag cuckoo filter receives a data member management operation, determining two candidate buckets and two tag fingerprints corresponding to preset data members based on an exclusive-or operation, and executing the data member management operation based on the two candidate buckets and the two tag fingerprints obtained through determination. The invention respectively configures two label fingerprints and two storage buckets for each data member, determines the candidate storage buckets corresponding to the data members by adopting XOR operation based on the label fingerprints, and does not require that the number of the storage buckets is required to be the power of 2, thereby reducing the storage space overhead of each data member.
The invention will be further explained by the description of the embodiments with reference to the drawings.
The present implementation provides a tabu cuckoo filter, as shown in fig. 1, the tabu cuckoo filter includes a cuckoo hash table, where the cuckoo hash table is a compact cuckoo hash table. The tagebrowth filter allocates two tagfingerprints to preset data members, respectively maps the two tagfingerprints to two candidate buckets, and after determining the index of one candidate bucket of the two candidate buckets, may determine the index of the other candidate bucket based on an exclusive or operation. It can be understood that, when the preset data member is obtained, the index of the candidate bucket a corresponding to the preset data member can be calculated and obtained based on the preset data memberAnd determining the index of the candidate bucket B corresponding to the preset data member by adopting the XOR operation so as to determine two candidate buckets corresponding to the preset data member. In addition, the storage bucket number corresponding to the cuckoo hash table is not a power of 2, namely the storage bucket number is not 2 b B is an exponent, and b may be a positive integer. For example, the number of buckets is not 4,8,16, etc. Of course, it is worth noting that when the number m of storage buckets is a power of 2, the tag cuckoo filter in the present embodiment degenerates to an existing cuckoo filter, the tag fingerprint does not include a tag and there is only one fingerprint per preset data member.
Further, each bucket in the cuckoo hash table includes a specified number of storage locations, wherein a storage location is used to store a tag fingerprint of a data member, and each storage location stores a tag fingerprint of a data member. In addition, the specified number can be determined according to actual needs. For example, the specified number is 4, etc. When the specified number is 4, it means that each bucket in the cuckoo hash table contains four storage locations, that is, each bucket can store tag fingerprints of four data members.
Further, the Tag fingerprint includes a Tag (Tag) used to represent a relation between a candidate bucket index corresponding to the Tag fingerprint and a bucket number (i.e., the number of buckets), and a Tag fingerprint Remainder (Remainder) used to represent a fingerprint of the data member. In one specific implementation, as shown in fig. 2, the tag contains 1 bit, which indicates whether the candidate bucket index i storing the fingerprint of the tag is smaller than the total bucket number m, if i < m, the tag value of the fingerprint is set to 0, otherwise, the tag value of the fingerprint is set to 1. The tag fingerprint remainder comprises the remaining r bits of the tag fingerprint, the tag fingerprint remainders of the two tag fingerprints corresponding to each data member are the same, and the tags corresponding to the two tag fingerprints can be the same or different.
The data member management operation comprises one or more of an insert operation, a query operation and a delete operation. In this embodiment, the data member management operation may be an insert operation, a query operation, or a delete operation. It will be appreciated that the tag cuckoo filter supports insert operations, query operations, and delete operations. The following describes the insertion operation, the inquiry operation, and the deletion operation in a specific manner.
In one implementation manner of this embodiment, the data member management operation is an insert operation; when the number of buckets of the tabby cuckoo filter is not a power of 2, as shown in fig. 3, when the tabby cuckoo filter receives a data member management operation, determining two candidate buckets and two tab fingerprints corresponding to preset data members based on an exclusive or operation, and executing the data member management operation based on the two candidate buckets and the two tab fingerprints obtained by the determination specifically includes:
a10, determining a label fingerprint remainder corresponding to the preset data member and a first candidate storage bucket;
a20, determining a second candidate bucket corresponding to the preset data member by adopting an exclusive OR operation based on the tag fingerprint remainder and the first candidate bucket;
a30, determining a first label corresponding to a preset data member according to the first candidate bucket, and determining a second label corresponding to the preset data member according to the second candidate bucket;
a40, determining a first label fingerprint corresponding to a preset data member according to the first label and the label fingerprint remainder, determining a second label fingerprint corresponding to the preset data member according to the second label and the label fingerprint remainder;
a50, respectively correcting the first candidate bucket and the second candidate bucket according to the number of the storage buckets, and taking the corrected first candidate bucket and the corrected second candidate bucket as a first candidate bucket and a second storage candidate bucket corresponding to preset data members;
a60, detecting whether idle storage positions exist in the first candidate bucket and the second candidate bucket, and if the idle storage positions exist, executing the step A70; if no free storage location exists, executing step A80;
a70, storing a candidate tag fingerprint of a preset data member in the idle storage position, wherein the candidate tag fingerprint is a fingerprint tag corresponding to a candidate storage bucket to which the idle storage position belongs;
a80, selecting a target candidate bucket from the first candidate bucket and the second candidate bucket, and taking the label fingerprint corresponding to the target candidate bucket as the label fingerprint corresponding to a preset data member;
a90, selecting a target label fingerprint in the target candidate storage bucket, and storing the label fingerprint corresponding to the preset data member in a storage position corresponding to the target label fingerprint;
a100, determining a reference storage bucket and a reference label fingerprint corresponding to the target label fingerprint by adopting an exclusive OR operation according to the target candidate storage bucket and the target label fingerprint; if the reference bucket has a free storage location, executing step a110; if the reference bucket does not have a free storage location, executing step A120;
a110, storing the reference label fingerprint in the free storage position;
a120, selecting a target storage position in the reference storage bucket, and storing the reference label fingerprint in the target storage position;
a130, taking the label fingerprint corresponding to the target storage position as a target label fingerprint, and taking the reference storage bucket as a target candidate storage bucket; step a100 is continuously executed until the reference bucket has a free storage location or the execution times reaches a preset time threshold.
Specifically, in the step a10, the preset data member is a data member corresponding to the insert operation, and when the insert operation is received, the preset data member is acquired. After the preset data member is obtained, calculating a label fingerprint remainder corresponding to the preset data member and an index h of a first candidate storage bucket based on a hash function G (x) 0 (x) (ii) a The tag fingerprint remainder and the index of the first candidate bucket may be calculated as:
h 0 (x):r x =G(x) (1)
wherein r is x The remainder of the tag fingerprint corresponding to the predetermined data member, h 0 (x) The index of the first candidate bucket is X is a preset data member, G (x) is a hash value corresponding to the preset data member X, wherein the lower numerical value in the hash value is r x High numerical value of h 0 (x) And ":" is a numerical value connector, said h 0 (x) Has a value range of [ 0.. Multidot.M-1 ]]And M is the minimum power of 2 larger than M, and M is the number of storage barrels corresponding to the cuckoo hash table. For example, if M =6, M =8, and if M =14, M =16.
Further, in the step a20, after determining that the tag fingerprint remainder corresponding to the preset data member and the index of the first candidate bucket are obtained, an exclusive or operation may be used to determine a second candidate bucket corresponding to the preset data member. Wherein the index h of the second candidate bucket 1 (x) The calculation formula of (c) may be:
Figure BDA0002474990970000101
wherein r is x The remainder of the tag fingerprint corresponding to a predetermined data member, h 0 (x) Is the index of the first candidate bucket, x is a predetermined data member,
Figure BDA0002474990970000102
is an XOR operator, "mod" is a modulo operator, H (r) x ) Is the hash value corresponding to the tag fingerprint remainder, hash value H (r) x ) Has a value range of [ 0.. Multidot.M-1 ]];h 1 (x) Is an index of a second candidate bucket, h 1 (x) Has a value range of [ 0.. Multidot.M-1 ]]And M is the minimum power of 2 larger than M, and M is the number of storage barrels corresponding to the cuckoo hash table.
Further, in the step a30, after the index of the first candidate bucket and the index of the second candidate bucket are obtained, a first tag and a second tag corresponding to the preset data member may be determined according to the first candidate bucket and the second candidate bucket, where the first tag is determined according to the first candidate bucket, and the second tag is determined according to the first candidate bucket. In addition, as the relationship between the index of the candidate bucket corresponding to the tag fingerprint and the number of storage buckets is represented by the tags, the first tag may be determined according to the relationship between the index of the first candidate bucket and the number m of storage buckets, and the second tag may be determined according to the relationship between the index of the second candidate bucket and the number m of storage buckets.
In a specific implementation manner of this embodiment, the indexes h of the first candidate buckets may be respectively set 0 (x) And index h of the second candidate bucket 1 (x) Index h of first candidate buckets compared to bucket number m 0 (x) Greater than or equal to the number m of storage buckets, the first tag
Figure BDA0002474990970000111
Is 0, otherwise, the index h of the first candidate buckets 0 (x) Less than the number m of storage buckets, the first tag->
Figure BDA0002474990970000112
1, and similarly, the index h of a number of second candidate buckets 1 (x) Greater than or equal to the number m of storage buckets, the second tag->
Figure BDA0002474990970000113
Is 0, otherwise, the index h of the first candidate buckets 1 (x) Less than the number m of storage buckets, then the second tag
Figure BDA0002474990970000114
Is 1. Whereby the first label->
Figure BDA0002474990970000115
The calculation formula of (a) and the calculation formula of the second tag may be:
Figure BDA0002474990970000116
Figure BDA0002474990970000117
wherein the content of the first and second substances,
Figure BDA0002474990970000118
is the first label, is selected>
Figure BDA0002474990970000119
Is a second label, h 0 (x) Index of the first candidate bucket, h 1 (x) Is an index of a second candidate bucket, r x Is the tag fingerprint remainder.
Further, in the step a50, after the first tag and the second tag are determined, the index of the first candidate bucket and the index of the second candidate bucket may be modified based on the number of storage buckets, and the modified index of the first candidate bucket and the modified index of the second candidate bucket are used as the index of the first candidate bucket and the index of the second candidate bucket corresponding to the preset data member, where the modified index of the first candidate bucket and the modified index of the second candidate bucket are both [ 0.,. M-1]. In an implementation manner of this embodiment, the modification formulas of the first candidate bucket and the second candidate bucket are respectively:
Figure BDA0002474990970000121
wherein h is 0 (x) Index of the first candidate bucket, h 1 (x) For the index of the second candidate bucket, m is the bucket number and "mod" is the modulo operator.
Further, in step a60, after the index of the first candidate bucket and the index of the second candidate bucket are obtained, the first candidate bucket and the second candidate bucket may be determined according to the index of the first candidate bucket and the index of the second candidate bucket. Upon determining the first candidate bucket and the second candidate bucket, detection may be performedThe first candidate bucket h 0 (x) And the second candidate bucket h 1 (x) Whether there are free memory locations. Wherein said detecting said first candidate bucket h 0 (x) And the second candidate bucket h 1 (x) The process of whether there are free storage locations may be: searching sequentially for a first candidate bucket h 0 (x) And the second candidate bucket h 1 (x) Search for the first candidate bucket h 0 (x) And the second candidate bucket h 1 (x) If the storage position with the storage number being the preset number is searched, the storage position is judged to be an idle storage position. The preset value indicates that the storage location is a free storage location and does not store the fingerprint of the element, and the preset value is preset, for example, 0.
Further, in the step a70, when there are idle storage locations, obtaining each idle storage location, selecting one idle storage location from all the obtained idle storage locations, determining a candidate bucket corresponding to the idle storage location, and storing a label fingerprint corresponding to the candidate bucket in the idle storage location. Wherein the free storage location may be a first candidate bucket h 0 (x) May also be the second candidate bucket h 1 (x) E.g. when the free memory location is the first candidate bucket h 0 (x) When the first tag fingerprint is stored in the idle storage position, storing the first tag fingerprint in the idle storage position; when the idle storage position is the second candidate bucket h 1 (x) And storing the second tag fingerprint in a free memory location. In addition, after the label fingerprints corresponding to the preset data members are stored in the idle storage positions, the insertion operation corresponding to the preset data members is completed.
Further, in step a80, when there is no free storage location, a target candidate bucket (i.e. the current bucket) is selected from the first candidate bucket and the second candidate bucket, and the tag fingerprint corresponding to the target candidate bucket is used as the selected tag fingerprint. For example, when the first candidate bucket is taken as the target candidate bucket, the first tagged fingerprint is the selected tagged fingerprint (i.e., new tagged fingerprint f); when the second candidate bucket is considered as the target candidate bucket, the second tagged fingerprint is the selected tagged fingerprint. In addition, before the target candidate bucket is selected, the number of data member moves is set to 0.
Further, in the step a90, a storage location is randomly selected from the target candidate bucket, the tag fingerprint stored in the storage location is used as a target tag fingerprint (i.e. an old tag fingerprint), and the selected tag fingerprint corresponding to the preset data member is stored in the storage location corresponding to the target tag fingerprint. In addition, before a storage position is randomly selected from the target candidate storage bucket, the moving times are increased by 1, whether the moving times exceed a preset time threshold or not is judged, if the moving times exceed the preset time threshold, the insertion failure of the preset data member is judged, and if the moving times do not exceed the preset time threshold, the step A100 is executed. The preset number threshold may be determined according to an actual situation, for example, the preset number threshold is 500.
Further, in the step a100, after the target tag fingerprint is selected from the target candidate buckets, an exclusive or operation is used to determine a reference bucket (i.e., the reference bucket is used as the current bucket h) and a reference tag fingerprint (i.e., the reference tag fingerprint is used as the old fingerprint g') corresponding to the target tag fingerprint. Before determining the reference bucket, the index of the target candidate bucket needs to be adjusted according to the target tag corresponding to the target tag fingerprint. Wherein if the tag value of the target tag is 0, the index of the target candidate bucket remains unchanged, and if the tag value of the target tag is 1, the index of the target candidate bucket is adjusted to the index of the target candidate bucket plus m. For example, if the target tag g is 0 and the index of the target candidate bucket is i, the index of the adjusted target candidate bucket is still i; the target tag g is 1, and the index of the target candidate bucket is i, so the index of the adjusted target candidate bucket is still i + m. In addition, after determining the index of the adjusted target candidate bucket, calculating the index of a reference bucket according to the target tag fingerprint residue of the target tag fingerprint and the index of the adjusted target candidate bucket, wherein the reference bucket is another candidate bucket corresponding to the data member corresponding to the target tag fingerprint. The calculation formula of the index j of the reference bucket may be:
Figure BDA0002474990970000131
wherein i is an index of the adjusted target candidate bucket, r is a target tag fingerprint remainder, H (r) is a hash value corresponding to the target tag fingerprint remainder, and the value range of H (r) is [0],
Figure BDA0002474990970000132
Is an exclusive or operator and "mod" is a modulo operator.
Further, after the buckets are referred to, the reference label fingerprints corresponding to the reference buckets are calculated according to the target label fingerprint remainders of the target label fingerprints and the indexes of the reference buckets. Wherein, the calculation process of the reference tag fingerprint may be: checking whether the index j of the reference bucket is less than m, and if j < m, setting the tag value of the reference tag fingerprint g' to 0; otherwise, the tag value of the reference tag fingerprint g' is set to 1, and the calculation formula is as follows:
Figure BDA0002474990970000141
wherein g' is the reference label fingerprint, r is the remainder of the target label fingerprint, j is the index of the reference bucket, and m is the number of buckets.
After the reference tag of the reference tag fingerprint is obtained, adjusting the index j of the test bucket, wherein the adjusted index j of the test bucket is in the range [ 0.·, m-1], and the adjustment formula of the index j of the test bucket is as follows:
j=j mod m (8)
where j is the index of the reference bucket, m is the number of buckets, and "mod" is the modulo operator.
Further, after the reference buckets are obtained, whether the reference buckets all have idle storage positions is detected, and if the reference buckets have idle storage positions, the step a110 is executed; if the reference bucket does not have a free storage location, step a120 is performed.
Further, in the step a110, the reference tag fingerprint is stored in the free storage location, and the inserting operation of the preset data member is completed.
Further, in step a120, a target storage location is selected from the reference bucket, and the reference tag fingerprint is stored in the target storage location; and taking the tag fingerprint corresponding to the target storage location as a reference tag fingerprint, taking the reference bucket as a target candidate bucket, and continuing to execute the step a100 until the reference bucket has a free storage location or the number of times of movement exceeds a preset number threshold. In addition, a target storage position is selected from the reference storage bucket, the moving times are increased by 1 by self before the reference label fingerprint is stored in the target storage position, so that the member moving times corresponding to the insertion operation of the preset member elements are counted, whether the insertion operation corresponding to the preset member elements is finished or not is measured, and the insertion operation is prevented from entering a dead loop.
The element insertion method of the plus-minus cuckoo filter is illustrated as follows:
example 1: as shown in fig. 4, when an element x is inserted, first, the tag fingerprint remainder r of the element x is calculated using equations (1) and (2) x And two candidate bucket indices h 0 (x) =6 and h 1 (x) =4; next, the two tag fingerprints for element x are calculated using equations (3), (4), and (5)
Figure BDA0002474990970000151
And &>
Figure BDA0002474990970000152
Simultaneous adjustment of two candidate bucket indices h 0 (x) And h 1 (x) In the range [0, …,4]So that the adjusted candidate bucket index is h 0 (x) =1 and h 1 (x)=4(ii) a Next, two candidate buckets h are searched 0 (x) =1 and h 1 (x) =4, find that both candidate buckets contain free storage locations; finally, randomly selecting a free candidate bucket h 0 (x) =1, store the corresponding label fingerprint £>
Figure BDA0002474990970000153
In the bucket.
Example 2: as shown in fig. 5, when an element y is inserted, first, the tag fingerprint remainder r of the element y is calculated using equations (1) and (2) y And two candidate bucket indices h 0 (y) =3 and h 1 (y) =6; next, two tag fingerprints for element y are calculated using equations (3), (4), and (5)
Figure BDA0002474990970000154
And &>
Figure BDA0002474990970000155
Simultaneous adjustment of two candidate bucket indices h 0 (y) and h 1 (y) is in the range [0, …,4]So that the adjusted candidate bucket index is h 0 (y) =3 and h 1 (y) =1; next, two candidate bucket indices h are searched 0 (y) =3 and h 1 (y) =1, and randomly selecting a candidate bucket h when finding that the two buckets do not contain idle storage positions 1 (y) =1 and the old fingerprint &isremoved from the bucket>
Figure BDA0002474990970000156
Storing a new fingerprint pick>
Figure BDA0002474990970000157
In the bucket; then, based on the old fingerprint
Figure BDA0002474990970000158
And current bucket index h 1 (a) =1, assume that the label fingerprint +>
Figure BDA0002474990970000159
Has a tag value of 1 and a remainder of r a Another candidate bucket index h is calculated using equations (6), (7) and (8) 0 (a) =4 and its corresponding label fingerprint ÷>
Figure BDA00024749909700001510
(i.e. is->
Figure BDA00024749909700001511
The tag value is set to 0 because h 0 (a)=4<m = 5); finally, search for candidate bucket h 0 (a) =4, find that the candidate bucket contains a free storage location, store the corresponding tag fingerprint £ be £ stored>
Figure BDA00024749909700001512
In the bucket.
Further, in one implementation of the present embodiment, upon receiving an insert operation, it may be determined whether the number of buckets, m, is a power of 2. If m is a power of 2, the tag cuckoo filter degenerates to an existing cuckoo filter, and if m is not a power of 2, the process of performing the insert operation according to the tag bird filter is performed. When the tag cuckoo filter is degraded into the existing cuckoo filter (i.e. when the number of storage buckets is a power of 2), as shown in fig. 6, the insertion operation may be implemented as follows:
b10, calculating the fingerprint f of the element x x And a first candidate bucket index h 0 (x) (ii) a Computing the fingerprint f of an element x using a hash function G (x) x And candidate bucket index h 0 (x) The calculation formula is as follows:
h 0 (x):f x =G(x) (9)
wherein, the lower value of the hash value G (x) is f x High numerical value of h 0 (x) And the term' is a numerical value connector. First candidate bucket index h 0 (x) Is in the range of [ 0., M-1]]Where M is a power of 2 and M is equal to M.
B20, calculating a second candidate bucket index h of the element x 1 (x) In that respect Element x based fingerprint f x And a first candidate storageBucket index h 0 (x) Computing a second candidate bucket index h using an XOR operation 1 (x) The calculation formula is as follows:
Figure BDA0002474990970000161
wherein the hash value H (r) x ) Is in the range of [ 0., M-1]],
Figure BDA0002474990970000162
Is an XOR operator, "mod" is a modulo operator, and the second candidate bucket index h 1 (x) In the range of [0, 1, M-1]]Where M is a power of 2 and M is equal to M.
B30, judging two candidate buckets h 0 (x) And h 1 (x) Whether at least one contains a free storage location; searching each candidate bucket h 0 (x) Or h 1 (x) Checking whether a free storage location is included; if at least one candidate bucket contains a free storage location, go to step B40; otherwise, step B50 is entered.
B40, storing the fingerprint f of the element x x In one of the free candidate buckets h 0 (x) Or h 1 (x) In (3), the element x insertion ends.
B50, randomly selecting a candidate bucket h 0 (x) Or h 1 (x) Setting a new fingerprint f as the fingerprint f x (i.e. f = f) x ). Setting target candidate bucket h to h 0 (x) Or h 1 (x) And the number of element moves is set to 0.
B60, increasing the element moving times by 1, and judging whether the element moving times exceed 500; if the number of times exceeds 500, entering the step B70; otherwise, step B80 is entered.
B70, element x insertion fails, and element insertion ends.
B80, randomly selecting a storage position of the target candidate bucket h, moving the target fingerprint g of the storage position out, and storing the new fingerprint f in the storage position.
And B90, calculating another reference selection storage bucket index of the target fingerprint g, and setting the reference selection storage bucket index as the target candidate storage bucket h. Based on the target fingerprint g and the target candidate bucket index i (i.e. i = h), calculating a reference selected bucket index j by using an exclusive or operation, wherein the calculation formula is as follows:
Figure BDA0002474990970000163
wherein the hash value H (g) ranges from [ 0.,. Multidot.M-1],
Figure BDA0002474990970000164
Is an exclusive or operator, "mod" is a modulo operator; the index j of the reference bucket is set to the target candidate bucket index h (i.e., h = j).
B100, judging whether the target candidate bucket h comprises at least one free storage position. If at least one free storage position is included, entering step B110; otherwise, step B120 is entered.
B110, storing the target fingerprint g in the target candidate bucket h, and ending element insertion.
B120, jumping to the step B60, and recursively removing other old fingerprints and inserting new fingerprints until the moving times exceed 500.
In an implementation manner of this embodiment, the data member management operation is a query operation; as shown in fig. 7, when the tagger filter receives a data member management operation, determining two candidate buckets and two tag fingerprints corresponding to preset data members based on an exclusive or operation, and executing the data member management operation based on the two candidate buckets and the two tag fingerprints specifically includes:
c10, determining a label fingerprint remainder corresponding to the preset data member and a first candidate storage bucket;
c20, determining a second candidate bucket corresponding to the preset data member by adopting an exclusive OR operation based on the tag fingerprint remainder and the first candidate bucket;
c30, determining a first label corresponding to a preset data member according to the first candidate storage bucket, and determining a second label corresponding to the preset data member according to the second candidate storage bucket;
c40, determining a first label fingerprint corresponding to a preset data member according to the first label and the label fingerprint remainder, and determining a second label fingerprint corresponding to the preset data member according to the second label and the label fingerprint remainder;
c50, respectively correcting the first candidate bucket and the second candidate bucket according to the number of the storage buckets, and taking the corrected first candidate bucket and the corrected second candidate bucket as a first candidate bucket and a second storage candidate bucket corresponding to a preset data member;
c60, respectively searching a first label fingerprint and a second label fingerprint corresponding to the preset data member in the first candidate bucket and the second candidate bucket;
c70, if the first label fingerprint or the second label fingerprint corresponding to the preset data member is found, prompting that the data member is successfully inquired;
and C80, if the first label fingerprint or the second label fingerprint corresponding to the preset data member is not found, prompting that the data member fails to inquire.
Specifically, the execution process of steps C10 to C50 is the same as the execution process of steps a10 to a50, and is not repeated here, and the description of steps a10 to a50 may be specifically referred to.
Further, the first candidate bucket h is obtained 0 (x) And a second candidate bucket h 1 (x) Thereafter, the first candidate bucket h is searched 0 (x) Whether the first tag fingerprint of the pre-data member x is matched
Figure BDA0002474990970000181
And a second candidate bucket h 1 (x) Whether or not it matches the second tag fingerprint £ of the pre-data member x>
Figure BDA0002474990970000182
If the first tag fingerprint is matched +>
Figure BDA0002474990970000183
Or a second tag fingerprint->
Figure BDA0002474990970000184
Indicating that the preset data member element x is in the set, returning a query result to be True (True), and finishing the query of the preset data member; if no matching first tag fingerprint is stored +>
Figure BDA0002474990970000185
Or a second tag fingerprint->
Figure BDA0002474990970000186
Indicating that preset data member element x is not in the set, returning the query result as False (False), and the element query is finished.
The element query method of the plus-minus cuckoo filter is illustrated as follows:
as shown in FIG. 8, when querying element x, first, two candidate bucket indices h for element x are calculated using equations (1) and (2) 0 (x) =6 and h 1 (x) =4; secondly, two candidate buckets h are calculated by using the formulas (3), (4) and (5) 0 (x) And h 1 (x) Corresponding two tag fingerprints
Figure BDA0002474990970000187
And &>
Figure BDA0002474990970000188
Adjusting two candidate bucket indices to h simultaneously 0 (x) =1 and h 1 (x) =4; next, two candidate buckets h are searched 0 (x) =1 and h 1 (x) =4 matches of corresponding two tag fingerprints
Figure BDA0002474990970000189
And &>
Figure BDA00024749909700001810
Finding candidate buckets h 0 (x) =1 there is a matching label fingerprint ÷>
Figure BDA00024749909700001811
Finally, the query indicates that element x is in the set, returning the query result as true.
In one implementation of the present embodiment, upon receiving an insert operation, it may be determined whether the number of buckets, m, is a power of 2. If m is a power of 2, the tag cuckoo filter degenerates to an existing cuckoo filter, and if m is not a power of 2, then the process of performing a query operation according to the tag bird filter is performed. When the tag cuckoo filter is degraded into the existing cuckoo filter, as shown in fig. 7, the query operation may be implemented by:
d10, calculating the fingerprint f of the element x x And a first candidate bucket index h 0 (x) In that respect Calculating the fingerprint f of the element x using equation (9) x And candidate bucket index h 0 (x) Wherein h is 0 (x) In the range of [0, 1, M-1]]Where M is a power of 2 and M is equal to M.
D20, calculating a second candidate bucket index h of the element x 1 (x) In that respect Calculating a second candidate bucket index h using equation (10) 1 (x) Wherein h is 1 (x) Is in the range of [ 0., M-1]]M is a power of 2 and M is equal to M.
D30, searching two candidate buckets h 0 (x) And h 1 (x) Whether or not to match a fingerprint f x (ii) a If matching the fingerprint f x Entering step D40; otherwise, step D50 is entered.
D40, indicating that the element x is in the set, returning the query result as true, and ending the element query.
D50, indicating that the element x is not in the set, returning a query result as false, and ending the element query.
In one implementation manner of this embodiment, the data member management operation is a delete operation; as shown in fig. 9, when the tagger filter receives a data member management operation, determining two candidate buckets and two tag fingerprints corresponding to preset data members based on an exclusive or operation, and executing the data member management operation based on the two candidate buckets and the two tag fingerprints specifically includes:
e10, determining a label fingerprint remainder corresponding to the preset data member and a first candidate storage bucket;
e20, determining a second candidate bucket corresponding to the preset data member by adopting an exclusive OR operation based on the tag fingerprint remainder and the first candidate bucket;
e30, determining a first label corresponding to the preset data member according to the first candidate bucket, and determining a second label corresponding to the preset data member according to the second candidate bucket;
e40, determining a first label fingerprint corresponding to a preset data member according to the first label and the label fingerprint remainder, and determining a second label fingerprint corresponding to the preset data member according to the second label and the label fingerprint remainder;
e50, respectively correcting the first candidate bucket and the second candidate bucket according to the number of the storage buckets, and taking the corrected first candidate bucket and the corrected second candidate bucket as a first candidate bucket and a second candidate bucket corresponding to a preset data member;
e60, respectively searching a first label fingerprint and a second label fingerprint corresponding to the preset data member in the first candidate bucket and the second candidate bucket;
e70, if the first tag fingerprint or the second tag fingerprint corresponding to the preset data member is found, deleting the label fingerprint corresponding to the searched preset data member, wherein the label fingerprint is a first label fingerprint or a second label fingerprint;
and E80, if the first label fingerprint or the second label fingerprint corresponding to the preset data member is not found, prompting that the deletion of the data member fails.
Specifically, the execution process of steps E10 to E50 is the same as the execution process of steps a10 to a50, and is not repeated here, and the description of steps a10 to a50 may be referred to specifically.
Further, when the first candidate bucket h is obtained 0 (x) And a second candidate bucket h 1 (x) Thereafter, the first candidate bucket h is searched 0 (x) Whether the first tag fingerprint of the pre-data member x is matched
Figure BDA0002474990970000201
And a second candidate bucket h 1 (x) Whether or not it matches the second tag fingerprint £ of the pre-data member x>
Figure BDA0002474990970000202
If the first tag fingerprint is matched +>
Figure BDA0002474990970000203
Or a second tag fingerprint->
Figure BDA0002474990970000204
Deleting the label fingerprint corresponding to the searched preset data member, and finishing the deletion of the data member; if no matching first tag fingerprint is stored +>
Figure BDA0002474990970000205
Or a second tag fingerprint->
Figure BDA0002474990970000206
The table deletes the data member unsuccessfully and the element deletion ends.
The element deletion method of the plus-minus cuckoo filter is illustrated as follows:
as shown in FIG. 10, when an element y is deleted, first, two candidate bucket indices h for the element y are calculated using equations (1) and (2) 0 (y) =3 and h 1 (y) =6; secondly, two candidate buckets h are calculated by using the formulas (3), (4) and (5) 0 (y) and h 1 (y) two corresponding tag fingerprints
Figure BDA0002474990970000207
And &>
Figure BDA0002474990970000208
Adjusting two candidate bucket indices to h simultaneously 0 (y) =3 and h 1 (y) =1; then, the process of the present invention is carried out,searching two candidate buckets h 0 (y) =3 and h 1 (y) =1 or not matches the corresponding two tag fingerprints &>
Figure BDA0002474990970000209
And &>
Figure BDA00024749909700002010
Assume that two tag fingerprints £ are present>
Figure BDA00024749909700002011
And &>
Figure BDA00024749909700002012
Equal (i.e.. Sup +)>
Figure BDA00024749909700002013
) Find candidate bucket h 0 (y) =3 and h 1 (y) =1 there is a matching tag fingerprint &>
Figure BDA00024749909700002014
And &>
Figure BDA00024749909700002015
Finally, a candidate bucket h is randomly selected 1 (y) =1, a matching tagged fingerprint &'s deleted from the bucket>
Figure BDA00024749909700002016
Deletion of element y was successful.
Furthermore, the element deleting method of the tag cuckoo filter can ensure that the elements are correctly deleted without generating false negative errors. If the fingerprints of two inserted elements are the same, the plus-minus cuckoo filter inserts the two fingerprints of the two elements into the filter. If one of the two elements is deleted, the other is still in the filter and therefore no false negative errors and possibly false positive errors occur. For example, in FIG. 10, element g is queried after element y is deleted, due to the tag fingerprint of element g
Figure BDA00024749909700002017
And &>
Figure BDA00024749909700002018
And the label fingerprint of element y->
Figure BDA00024749909700002019
And &>
Figure BDA00024749909700002020
Equal (i.e.. Sup +)>
Figure BDA00024749909700002021
And->
Figure BDA00024749909700002022
) Finding the fingerprint of the tag->
Figure BDA00024749909700002023
In candidate bucket h 0 (g) In =3, the query indicates that element g is in the set, returning the query result as true. When the element y is queried, the fingerprint ≥ due to the tag>
Figure BDA00024749909700002024
(i.e. is->
Figure BDA00024749909700002025
) Bucket h is still candidate 0 (y) =3, the query indicates that element y is in the set, and the returned query result is true; however, since element y was successfully deleted, the query results in a false positive error. Nevertheless, the element deletion method of the tag bloom filter does not increase the false positive error rate, which is low and the same as that of the standard bloom filter, the counting bloom filter, the bloom filter, and the like.
In one implementation of the present embodiment, upon receiving an insert operation, it may be determined whether the number of buckets, m, is a power of 2. If m is a power of 2, the tag cuckoo filter degenerates to an existing cuckoo filter, and if m is not a power of 2, the process of performing the delete operation according to the tag bird filter is performed. When the tag cuckoo filter is degraded into the existing cuckoo filter, as shown in fig. 9, the deleting operation may be implemented by:
f10, calculating the fingerprint F of the element x x And a first candidate bucket index h 0 (x) In that respect Calculating the fingerprint f of the element x using equation (9) x And candidate bucket index h 0 (x) Wherein h is 0 (x) In the range of h 1 (x) Is in the range of [ 0., M-1]]Where M is a power of 2 and M is equal to M
F20, calculating a second candidate bucket index h of the element x 1 (x) In that respect Calculating a second candidate bucket index h using equation (10) 1 (x) Wherein h is 1 (x) In the range of h 1 (x) Is in the range of [ 0., M-1]]M is a power of 2 and M is equal to M.
F30, searching two candidate buckets h 0 (x) And h 1 (x) Whether or not to match a fingerprint f x . If matching the fingerprint f x Entering a step DF0; otherwise, step F50 is entered.
F40, if the element x is in the set, deleting the label fingerprint F corresponding to the searched preset data member x And the element deletion ends.
F50, indicating that the element x is not in the set, returning a deletion result as false, and ending element deletion.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (9)

1. A tag cuckoo filter is characterized by comprising a cuckoo hash table, wherein the cuckoo hash table comprises a plurality of storage buckets, each data member corresponds to two tag fingerprints, and the two tag fingerprints are stored in the two storage buckets respectively; when the tag cuckoo filter receives a data member management operation, determining two candidate buckets and two tag fingerprints corresponding to a preset data member based on an exclusive-or operation, and executing the data member management operation based on the two candidate buckets and the two tag fingerprints obtained by determining, wherein the preset data member is a data member corresponding to the data member management operation, and the determining of the two candidate buckets and the two tag fingerprints corresponding to the preset data member based on the exclusive-or operation specifically includes:
determining a label fingerprint remainder corresponding to the preset data member and a first candidate bucket;
determining a second candidate bucket corresponding to the preset data member by adopting an exclusive-or operation based on the tag fingerprint remainder and the first candidate bucket;
determining a first label corresponding to a preset data member according to the first candidate bucket, and determining a second label corresponding to the preset data member according to the second candidate bucket;
determining a first label fingerprint corresponding to a preset data member according to the first label and the label fingerprint remainder, and determining a second label fingerprint corresponding to the preset data member according to the second label and the label fingerprint remainder;
and respectively correcting the first candidate bucket and the second candidate bucket according to the number of the storage buckets, and taking the corrected first candidate bucket and the corrected second candidate bucket as a first candidate bucket and a second storage candidate bucket corresponding to a preset data member.
2. The tagger-cuckoo filter of claim 1, wherein the tag fingerprints include tags and tag fingerprint remainders, and the tag fingerprint remainders of two corresponding tag fingerprints of the predetermined data members are the same.
3. The tagger-cuckoo filter of claim 1, wherein when the data membership management operation is an insert operation, the performing the data membership management operation based on the two candidate buckets and the two tag fingerprints includes:
detecting whether there are free storage locations in the first candidate bucket and the second candidate bucket;
and if the idle storage position exists, storing the candidate tag fingerprint of the preset data member in the idle storage position, wherein the candidate tag fingerprint is a fingerprint tag corresponding to the candidate storage bucket to which the idle storage position belongs.
4. The tag cuckoo filter of claim 3, wherein when the data membership management operation is an insert operation, the performing the data membership management operation based on the determined two candidate buckets and the two tag fingerprints comprises:
if no idle storage position exists, selecting a target candidate storage bucket from the first candidate storage bucket and the second candidate storage bucket, and taking the label fingerprint corresponding to the target candidate storage bucket as the label fingerprint corresponding to a preset data member;
selecting a target label fingerprint in the target candidate storage bucket, and storing the label fingerprint corresponding to the preset data member in a storage position corresponding to the target label fingerprint;
determining a reference storage bucket and a reference label fingerprint corresponding to the target label fingerprint by adopting an exclusive OR operation according to the target candidate storage bucket and the target label fingerprint;
if the reference bucket has a free storage location, the reference tag fingerprint is stored in the free storage location.
5. The tagger aventucko filter of claim 4, wherein when the data membership management operation is an insert operation, the performing the data membership management operation based on the determined two candidate buckets and the two tag fingerprints comprises:
if the reference storage bucket does not have an idle storage position, selecting a target storage position in the reference storage bucket, and storing the reference label fingerprint in the target storage position;
taking the label fingerprint corresponding to the target storage position as a target label fingerprint, and taking the reference storage bucket as a target candidate storage bucket;
and continuing to execute the step of determining the reference storage bucket and the reference label fingerprint corresponding to the target label fingerprint by adopting an exclusive OR operation according to the target candidate storage bucket and the target label fingerprint until the reference storage bucket has an idle storage position or the execution times reaches a preset time threshold.
6. The tagbugu bird filter of claim 4 or 5, wherein the determining, according to the target candidate buckets and the target tag fingerprints, the reference bucket and the reference tag fingerprint corresponding to the target tag fingerprint by using an exclusive-or operation specifically comprises:
obtaining a target tag of the target tag fingerprint and updating the target candidate bucket based on the target tag;
determining a comparison storage bucket corresponding to the target label fingerprint by adopting exclusive OR operation based on the updated target candidate storage bucket and the label fingerprint remainder of the target label fingerprint;
and determining a reference label fingerprint corresponding to the target label fingerprint based on the comparison storage bucket and the label fingerprint remainder of the target label fingerprint, and correcting the comparison storage bucket based on the number of the storage buckets to obtain the reference storage bucket corresponding to the target label fingerprint.
7. The tagger-brook filter of claim 1, wherein, when the data membership management operation is a query operation or a delete operation, the performing the data membership management operation based on the two candidate buckets and the two tag fingerprints specifically comprises:
searching a first label fingerprint and a second label fingerprint corresponding to the preset data member in the first candidate bucket and the second candidate bucket respectively;
and if the first label fingerprint or the second label fingerprint corresponding to the preset data member is found, executing the data member management operation on the fingerprint.
8. The tag cuckoo filter of claim 7, wherein the performing the data member management operation on the fingerprint specifically comprises:
when the data member management operation is a query operation, prompting the data member to successfully query;
and when the data member management operation is a deletion operation, deleting the label fingerprint corresponding to the found preset data member, wherein the label fingerprint is a first label fingerprint or a second label fingerprint.
9. The tagger-brook filter of claim 7, wherein, when the data membership management operation is a query operation or a delete operation, the performing the data membership management operation based on the two candidate buckets and the two tag fingerprints specifically comprises:
and if the first label fingerprint or the second label fingerprint corresponding to the preset data member is not found, prompting that the management operation of the data member fails.
CN202010360757.7A 2020-04-30 2020-04-30 Tag cuckoo filter Active CN111552693B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010360757.7A CN111552693B (en) 2020-04-30 2020-04-30 Tag cuckoo filter

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010360757.7A CN111552693B (en) 2020-04-30 2020-04-30 Tag cuckoo filter

Publications (2)

Publication Number Publication Date
CN111552693A CN111552693A (en) 2020-08-18
CN111552693B true CN111552693B (en) 2023-04-07

Family

ID=72003379

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010360757.7A Active CN111552693B (en) 2020-04-30 2020-04-30 Tag cuckoo filter

Country Status (1)

Country Link
CN (1) CN111552693B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112148928B (en) * 2020-09-18 2024-02-20 鹏城实验室 Cuckoo filter based on fingerprint family
CN112632337B (en) * 2020-12-28 2023-12-22 南方科技大学 Element management method applied to firework filter and firework filter
CN113535706B (en) * 2021-08-03 2023-05-23 佛山赛思禅科技有限公司 Two-stage cuckoo filter and repeated data deleting method based on two-stage cuckoo filter
CN113360516B (en) * 2021-08-11 2021-11-26 成都信息工程大学 Collection member management method
CN113641681B (en) * 2021-10-13 2022-02-22 南京大数据集团有限公司 Space self-adaptive mass data query method
CN116467307B (en) * 2023-03-29 2024-02-23 济南大学 Design method and system for cuckoo filter for reducing false positive rate

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101655861A (en) * 2009-09-08 2010-02-24 中国科学院计算技术研究所 Hashing method based on double-counting bloom filter and hashing device
CN105630955A (en) * 2015-12-24 2016-06-01 华中科技大学 Method for efficiently managing members of dynamic data set
CN107515901A (en) * 2017-07-24 2017-12-26 中国科学院信息工程研究所 A kind of chain type daily record storage organization and its Hash Index Structure, data manipulation method and server, medium
CN109815234A (en) * 2018-12-29 2019-05-28 杭州中科先进技术研究院有限公司 A kind of multiple cuckoo filter under streaming computing model
CN110046164A (en) * 2019-04-16 2019-07-23 中国人民解放军国防科技大学 Index independent grain distribution filter, consistency grain distribution filter and operation method
CN110222088A (en) * 2019-05-20 2019-09-10 华中科技大学 Data approximation set representation method and system based on insertion position selection

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101655861A (en) * 2009-09-08 2010-02-24 中国科学院计算技术研究所 Hashing method based on double-counting bloom filter and hashing device
CN105630955A (en) * 2015-12-24 2016-06-01 华中科技大学 Method for efficiently managing members of dynamic data set
CN107515901A (en) * 2017-07-24 2017-12-26 中国科学院信息工程研究所 A kind of chain type daily record storage organization and its Hash Index Structure, data manipulation method and server, medium
CN109815234A (en) * 2018-12-29 2019-05-28 杭州中科先进技术研究院有限公司 A kind of multiple cuckoo filter under streaming computing model
CN110046164A (en) * 2019-04-16 2019-07-23 中国人民解放军国防科技大学 Index independent grain distribution filter, consistency grain distribution filter and operation method
CN110222088A (en) * 2019-05-20 2019-09-10 华中科技大学 Data approximation set representation method and system based on insertion position selection

Also Published As

Publication number Publication date
CN111552693A (en) 2020-08-18

Similar Documents

Publication Publication Date Title
CN111552693B (en) Tag cuckoo filter
CN111552692B (en) Plus-minus cuckoo filter
CN112148928B (en) Cuckoo filter based on fingerprint family
US8397051B2 (en) Hybrid hash tables
WO2010135082A1 (en) Localized weak bit assignment
US10013312B2 (en) Method and system for a safe archiving of data
US20130151562A1 (en) Method of calculating feature-amount of digital sequence, and apparatus for calculating feature-amount of digital sequence
US8190591B2 (en) Bit string searching apparatus, searching method, and program
US10846309B2 (en) Data indexing method, data querying method and electronic device
CN106462639A (en) Processing mutations for remote database
CN107291710B (en) Method and device for updating data for distributed database system
CN111247518A (en) Database sharding
CN111475105B (en) Monitoring data storage method, monitoring data storage device, monitoring data server and storage medium
KR101575246B1 (en) Corrupted record recovery method in SQLite database file
CN109189759B (en) Data reading method, data query method, device and equipment in KV storage system
US10893067B1 (en) Systems and methods for rapidly generating security ratings
CN109743378B (en) Information pushing system, information pushing method and electronic equipment
CN111291002A (en) File account checking method and device, computer equipment and storage medium
CN113392089B (en) Database index optimization method and readable storage medium
CN109344163B (en) Data verification method and device and computer readable medium
CN111402958B (en) Method, system, equipment and medium for establishing gene comparison table
WO2011073680A1 (en) Improvements relating to hash tables
WO2011137684A1 (en) Search method and device based on information records of embedded system
CN111858606A (en) Data processing method and device and electronic equipment
CN112632337B (en) Element management method applied to firework filter and firework filter

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant