CN116467307A - Design method and system for cuckoo filter for reducing false positive rate - Google Patents

Design method and system for cuckoo filter for reducing false positive rate Download PDF

Info

Publication number
CN116467307A
CN116467307A CN202310344013.XA CN202310344013A CN116467307A CN 116467307 A CN116467307 A CN 116467307A CN 202310344013 A CN202310344013 A CN 202310344013A CN 116467307 A CN116467307 A CN 116467307A
Authority
CN
China
Prior art keywords
filter
candidate
fingerprint
cuckoo
false positive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310344013.XA
Other languages
Chinese (zh)
Other versions
CN116467307B (en
Inventor
赵川
王谦
赵圣楠
魏宇楠
荆山
陈贞翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Jinan
Original Assignee
University of Jinan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Jinan filed Critical University of Jinan
Priority to CN202310344013.XA priority Critical patent/CN116467307B/en
Publication of CN116467307A publication Critical patent/CN116467307A/en
Application granted granted Critical
Publication of CN116467307B publication Critical patent/CN116467307B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
  • Collating Specific Patterns (AREA)

Abstract

The invention discloses a cuckoo filter design method and system for reducing false positive rate, and relates to the technical field of cuckoo filters. The design method comprises the following steps: constructing a cuckoo filter consisting of a plurality of barrels, wherein each barrel consists of a plurality of grooves, and fingerprints of elements are stored in the grooves; uniformly dividing the cuckoo filter into an upper part and a lower part, wherein the upper part filter is called F 1 The lower half of the filter is called F 2 The method comprises the steps of carrying out a first treatment on the surface of the Setting p 1 And p 2 Two candidate buckets; the candidate barrel p corresponding to any element is obtained by modulo and exclusive-or the number of barrels in each part of filter 1 All fall under F 1 In the corresponding candidate bucket p 2 All fall under F 2 In (a) and (b); searching the element to be searched by using a cuckoo filter to find a candidate barrel p 1 Or p 2 The fingerprints in the method are the same as the fingerprints of the elements to be searched, and the searching is successful. The inventionThe comparison of candidate barrels can be realized without increasing metadata and fingerprint length, and the candidate barrels have strong compatibility, so that the false positive rate is greatly reduced.

Description

Design method and system for cuckoo filter for reducing false positive rate
Technical Field
The invention relates to the technical field of cuckoo filters, in particular to a cuckoo filter design method and system for reducing false positive rate.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
Cuckoo Filter (CF) is an approximate set representation structure derived from a Cuckoo hash, supporting the insertion, lookup, and deletion operations of elements. The cuckoo filter is essentially an array of barrels, each barrel being formed by a plurality of slots, the slots being the basic storage unit of the cuckoo filter, storing fingerprints of elements. The cuckoo filter can rapidly judge whether elements are in a set, so that the cuckoo filter is often used in applications such as deep packet inspection, searchable encryption, IP address blacklist inspection and the like. Each element corresponds to two candidate buckets p 1 And p 2 The fingerprint is stored in barrel p 1 Or p 2 Is a kind of medium. When searching elements in the filter, firstly calculating two corresponding candidate barrels and fingerprints, then traversing all fingerprints in the two candidate barrels, and judging whether the fingerprints are the same as the searched fingerprints or not. Since fingerprints are calculated through hash functions, fingerprints corresponding to different elements may be the same, so that the problem of false positive rate exists in the searching process. Increasing the fingerprint length may reduce the false positive rate, but this increases the overhead of memory space. The perfect cuckoo filter (Perfect Cuckoo Filter, PCF) can reduce false positive rate without increasing fingerprint length, and its specific implementation method is shown in figure 1, if storing fingerprints in candidate barrel p 1 If a certain slot in the candidate bucket p is stored with a metadata value of 0 2 The metadata value in that slot is set to 1. In the searching process, if the fingerprints in the groove are the same as the fingerprints of the searched element, continuously judging whether the metadata in the groove are consistent with the candidate barrel searched currently by the element, if soIf the result is consistent, the searching is successful, otherwise, the searching is failed. Although the candidate bucket values are compared by adding metadata, this introduces additional space overhead, and also results in a filter structure that is not as compatible as it cannot be directly applied to dynamic filters. In addition, the values of the meta-data need to be also compared during the lookup process, which can result in additional time overhead.
In practical application, the size of the data set is dynamically changed, the size of the cuckoo filter is fixed, the storage requirement of the dynamic data set cannot be met, and the dynamic cuckoo filter is generated. The dynamic cuckoo filter realizes the elastic change of the storage capacity by adjusting the number of the filters, and the specific implementation mode mainly comprises the following two modes. Firstly, the number of filters is adjusted based on a linked list or a binary tree structure, and a cuckoo filter is regarded as a node of the linked list or the tree, such as a dynamic cuckoo filter (Dynamic Cuckoo Filter, DCF) and a compressed logarithmic dynamic cuckoo filter (Compacted Logarithmic Dynamic Cuckoo Filter, CLDCF); and secondly, inspired by a distributed system, the number of filters is adjusted based on a Jump consistency hash algorithm, and a cuckoo Filter is regarded as an element in the Jump consistency hash algorithm, such as a Jump Filter (JF). These dynamic cuckoo filter schemes, while subtly implementing the elastic change in the number of filters, do not optimize the false positive rate of the cuckoo filter.
Current optimization for cuckoo filters is mainly focused on improving the throughput of their insertion and lookup operations, and existing solutions for reducing the false positive rate of filters not only at the expense of storage space but also with smaller magnitudes of reduction. Therefore, how to greatly reduce the false positive rate of the cuckoo filter and the dynamic cuckoo filter without increasing the storage space is a problem to be solved.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention aims to provide a cuckoo filter design method and system for reducing the false positive rate, which can realize the comparison of candidate barrels without increasing metadata and fingerprint length and realize the great reduction of the false positive rate. The design method has compatibility and can be applied to various dynamic cuckoo filters, so that the false positive rate of the dynamic cuckoo filter is greatly reduced.
In order to achieve the above object, the present invention is realized by the following technical scheme:
the first aspect of the invention provides a cuckoo filter design method for reducing false positive rate, comprising the following steps:
constructing a cuckoo filter consisting of a plurality of barrels, wherein each barrel consists of a plurality of grooves, and fingerprints of elements are stored in the grooves;
uniformly dividing the cuckoo filter into an upper part and a lower part, wherein the upper part filter is called F 1 The lower half of the filter is called F 2
Setting p 1 And p 2 Two candidate buckets; the candidate barrel p corresponding to any element is obtained by modulo and exclusive-or the number of barrels in each part of filter 1 All fall under F 1 In the corresponding candidate bucket p 2 All fall under F 2 In (a) and (b);
searching the element to be searched by using a cuckoo filter to find a candidate barrel p 1 Or p 2 The fingerprints in the method are the same as the fingerprints of the elements to be searched, and the searching is successful.
Further, the candidate bucket p corresponding to any element is obtained by modulo and exclusive-or the number of the buckets in each part of the filter 1 All fall under F 1 In the corresponding candidate bucket p 2 All fall under F 2 The specific formula is as follows:
where f represents the fingerprint, f= fingerprint (x), fingerprint (·) is the fingerprint calculation function, p 1 、p 2 Respectively represent two candidate barrels, x tableThe element, M, represents the number of buckets and the hash represents the hash function.
Further, the maximum false positive rate in the searching process is as follows:
wherein,,the maximum false positive rate of the cuckoo filter is represented, M represents the number of barrels, f represents fingerprints, and b represents the size of barrels.
Further, the concrete steps of element insertion in the cuckoo filter tank are as follows: the element x is first selected to correspond to the fingerprint f x Deposit into filter F 1 Candidate barrelIn the case of candidate bucket->When the fingerprint is full, the fingerprint is stored in the candidate barrel>In (a) and (b); if candidate barrel->Is full, randomly select->Middle fingerprint f z Performing eviction, wherein the evicted fingerprint is stored in the candidate barrel corresponding to the evicted fingerprint>In the middle, if->Still full state, then randomly select +.>The medium fingerprint evicts it to filter F 2 Is a kind of medium.
Further, the cuckoo filter is directly applied to the dynamic cuckoo filter in a filter replacement mode.
In a second aspect, the invention provides a design system for reducing the false positive rate of a cuckoo filter, comprising:
a filter construction module configured to construct a cuckoo filter composed of a plurality of barrels, each barrel being composed of a plurality of slots in which fingerprints of elements are stored;
a filter splitting module configured to uniformly divide the cuckoo filter into an upper half and a lower half, the upper half filter being referred to as F 1 The lower half of the filter is called F 2
A candidate bucket setting module configured to set p 1 And p 2 Two candidate buckets; the candidate barrel p corresponding to any element is obtained by modulo and exclusive-or the number of barrels in each part of filter 1 All fall under F 1 In the corresponding candidate bucket p 2 All fall under F 2 In (a) and (b);
an element searching module configured to search the element to be searched by using a cuckoo filter to find a candidate bucket p 1 Or p 2 The fingerprints in the method are the same as the fingerprints of the elements to be searched, and the searching is successful.
Further, candidate bucket p 2 Using exclusive or operation in the calculation process, the number of filter buckets per part needs to satisfy the relation M' =2 i-1 (i.gtoreq.2), M' represents the number of barrels per part of filter, i is an integer of 2 or more.
Further, the maximum false positive rate in the searching process is as follows:
wherein,,the maximum false positive rate of the cuckoo filter is represented, M represents the number of barrels, f represents fingerprints, and b represents the size of barrels.
Further, the concrete steps of element insertion in the cuckoo filter tank are as follows: the element x is first selected to correspond to the fingerprint f x Deposit into filter F 1 Candidate barrelIn the case of candidate bucket->When the fingerprint is full, the fingerprint is stored in the candidate barrel>In (a) and (b); if candidate barrel->Is full, randomly select->Middle fingerprint f z Performing eviction, wherein the evicted fingerprint is stored in the candidate barrel corresponding to the evicted fingerprint>In the middle, if->Still full state, then randomly select +.>The medium fingerprint evicts it to filter F 2 Is a kind of medium.
Further, the cuckoo filter is directly applied to the dynamic cuckoo filter in a filter replacement mode.
The one or more of the above technical solutions have the following beneficial effects:
the invention discloses a cloth grain for reducing false positive rateBird filter design method and system, dividing the filter into two parts F 1 And F 2 By designing candidate bucket p 1 And p 2 Calculation method for realizing p 1 All fall under F 1 Middle, p 2 All fall under F 2 Is effective in (1). Therefore, the false positive rate of the filter can be greatly reduced on the premise of not increasing metadata and fingerprint length, and the false positive rate of the cuckoo filter obtained by the design method is originalIn addition, the cuckoo filter obtained by the design method does not need additional comparison metadata, and the time cost is smaller. The invention does not reduce the load of the filter and can still achieve the load consistent with that of the cuckoo filter.
The design method of the invention has strong compatibility, can be directly applied to dynamic cuckoo filters such as DCF, CLDCF and JF, and the obtained new dynamic filter has lower false positive rate and is originalAt the same false positive rate as the original dynamic filter, the new dynamic filter occupies less storage space.
Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.
FIG. 1 is a schematic diagram of a prior art perfect cuckoo filter;
FIG. 2 is a schematic view of a modified cuckoo filter according to an embodiment of the invention;
FIG. 3 is a schematic diagram of a conventional cuckoo filter search process;
FIG. 4 is a schematic diagram of a modified cuckoo filter search process according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a prior art DCF;
FIG. 6 is a schematic diagram of an improved DCF according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a prior art CLDCF;
FIG. 8 is a schematic diagram of an improved CLDCF according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of a prior art JF;
fig. 10 is a schematic diagram of an improved JF according to an embodiment of the present invention.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It should be noted that, in the embodiments of the present invention, the data related to the cuckoo filter is required to be licensed or agreed by the user when the embodiments of the present invention are applied to specific products or technologies, and the collection, use and processing of the related data are required to comply with the related laws and regulations and standards of the related countries and regions.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof;
embodiment one:
the embodiment of the invention provides a cuckoo filter design method for reducing false positive rate, which comprises the following steps:
constructing a cuckoo filter consisting of a plurality of barrels, wherein each barrel consists of a plurality of grooves, and fingerprints f of elements are stored in the grooves; let the number of barrels in the filter be M and the barrel size be b, as shown in FIG. 2, toM=8, b=4 for example. Due to candidate bucket p 2 Exclusive-or operation is used in the calculation processThe number of buckets needs to satisfy the relationship m=2 i (i.gtoreq.0), otherwise the calculated candidate bucket will be out of filter range, typically M > 4. When i is more than or equal to 2, the cuckoo filter can be uniformly divided into an upper part and a lower part, and the number M 'of barrels in each part of filter can still meet the relation M' =2 i-1 (i.gtoreq.2), M' represents the number of barrels per part of filter, i is an integer of 2 or more. The upper half filter is called F 1 The lower half of the filter is called F 2 . Setting p 1 And p 2 Two candidate buckets; candidate bucket p corresponding to any element 1 All fall under F 1 In the corresponding candidate bucket p 2 All fall under F 2 Is a kind of medium.
In this embodiment, the specific steps of element insertion in the cuckoo filter tank are: first, the fingerprint fx corresponding to the element x is stored in the filter F 1 Candidate barrelIn the case of candidate bucket->When the fingerprint is full, the fingerprint is stored in the candidate barrel>In (a) and (b); if candidate barrel->Is full, randomly select->Middle fingerprint f z Performing eviction, wherein the evicted fingerprint is stored in the candidate barrel corresponding to the evicted fingerprint>In (1), if/>Still full state, then randomly select +.>The medium fingerprint evicts it to filter F 2 Wherein f is z And->Fingerprint and candidate bucket corresponding to element z respectively, +.>And->And corresponds to the candidate bucket for element x.
The embodiment gives a candidate bucket p 1 、p 2 By modulo and exclusive-or the number of buckets in each part of the filter, so as to make the candidate bucket p 1 All fall under F 1 In the corresponding candidate bucket p 2 All fall under F 2 The specific steps are shown in formulas (1) and (2):
where f represents the fingerprint, f= fingerprint (x), fingerprint (·) is the fingerprint calculation function, p 1 、p 2 Respectively representing two candidate buckets, x represents an element, M represents the number of buckets, and hash represents a hash function. Candidate bucket p 1 And p 2 By exclusive orCan realize the conversion between each otherTherefore, the method of the embodiment can still keep the characteristic that the cuckoo filter fingerprint is evicted between the two candidate barrels.
The success of element x search in a traditional cuckoo filter includes the following two cases, candidate bucketOr->The middle has fingerprint and f x The same applies. The search procedure is shown in fig. 3, taking m=8, b=4 as an example, the first case candidate bucket +.>The middle has fingerprint and f x The same, i.e. the fingerprint f stored in the second slot of the bucket 3 in the figure z And f x The same applies. Bucket 3 may be a candidate bucket corresponding to element z +.>It is also possible that the candidate bucket corresponding to element z +.>Failing to tell->And->The same applies. Similarly, the second case cannot tell +.>And (3) withThe same applies. In the searching process, the traditional cuckoo filter only compares whether fingerprints are identical, the fingerprints corresponding to different elements are possibly identical, and the probability of identical is +.>A total of 2b fingerprints need to be compared, thus the maximum false positive rate
In this embodiment, the element to be searched is searched by using a cuckoo filter to find the candidate bucket p 1 Or p 2 The fingerprints in the method are the same as the fingerprints of the elements to be searched, and the searching is successful.
As shown in fig. 4, candidate bucket p 1 Fall to F 1 In p 2 Put in F 2 Therefore bucket 3 is the candidate bucket for element zConclusion about->And->The same applies. Similarly, the second case can be given +.>And->Identical, i.e.)> Due to f x =f z Therefore, it isTherefore, in the searching process, the scheme of the embodiment not only judges whether the fingerprints corresponding to the two elements are the same, but also judges the candidate bucket p corresponding to the two elements 1 Whether or not the same is true. Candidate bucket p 1 Is long and isDegree log 2 M-1 bit string, candidate bucket p corresponding to two elements 1 The probability that is identical to the fingerprint f is +.>Maximum false positive rateWherein (1)>The maximum false positive rate of the cuckoo filter is represented, M represents the number of barrels, f represents fingerprints, and b represents the size of barrels.
The embodiment does not change the integral structure of the cuckoo filter, so the design method has strong compatibility, can be directly applied to the dynamic cuckoo filter, such as DCF, CLDCF and JF, and the obtained new dynamic filter has lower false positive rate and is originalAt the same false positive rate as the original dynamic filter, the new dynamic filter occupies less storage space. The cuckoo filter obtained by the design method of this example was designated NewCF.
The method comprises the following specific steps:
(1) NewCF applies to DCF.
As shown in fig. 5, in the DCF structure, taking m=8 and b=4 as an example, as the number of elements in the dynamic set increases, the cuckoo filter reaches the highest load, elements cannot be stored continuously, and the DCF links a new empty cuckoo filter after the original filter.
The maximum false positive rate of DCF iss is the number of CF in DCF. As shown in fig. 6, taking m=8 and b=4 as examples, the NewDCF can be obtained by only replacing CF with NewCF, and the maximum false positive rate of NewDCF can be obtainedSince the maximum false positive rate of CF is +.>The false positive rate of DCF is also +.>Multiple times. The same false positive rate->The space overhead of NewDCF is less than DCFBits.
(2) NewCF applies to CLDCF.
The CLDCF manages the number of cuckoo filters in the form of a binary tree, thereby realizing dynamic change of filter capacity. As shown in fig. 7, taking m=8 and b=4 as an example, when CF 1 When full, it expands two empty filters CF 2 And CF (compact F) 3 ,CF 1 Is moved to the CF based on its first bit value 2 Or CF (CF) 3 If 0, the fingerprint is moved to CF 2 If 1, then move the fingerprint to CF 3 CF at the same position of (a) 1 Delete CF after all fingerprint moves are completed 1 . When CF is 2 Or CF (CF) 3 Also when full, expansion is performed in the same manner. The maximum false positive rate of CLDCF isl i Is the number of layers.
As shown in fig. 8, taking m=8 and b=4 as an example, the NewCLDCF is obtained by replacing CF with NewCF, and the false positive rate isThe false positive rate of CLDCF is +.>Multiple times. Same false yangRate of->NewCLDCF has less memory space than CLDCFBits, s, are the number of CF's in CLDCF.
(3) NewCF applies to JF.
JF is to implement a dynamic cuckoo filter based on a jump consistency hash function, as shown in fig. 9, taking m=8, b=4 as an example, when the cuckoo filter CF 1 When the highest load is reached and the elements cannot be stored continuously, JF generates an empty cuckoo filter CF 2 Traversing CF 1 Whether to move it to CF is determined based on whether the value of j (f, 2) is 1 2 Wherein j (x, y) is a jump consistency hash function, f is a fingerprint, and "2" is the result obtained by adding 1 to the current number of cuckoo filters. After the movement is completed, a new element x is inserted, and the element is determined to be put into the CF according to the j (f, 2) value 1 Or CF (CF) 2 . When CF is 1 And CF (compact F) 2 At full time, JF generates a new filter CF 3 Traversing CF 1 And CF (compact F) 2 Whether to move it to CF is determined based on whether the value of j (f, 3) is 2 3 Is a kind of medium. After the movement is completed, a new element y is inserted, and the element is determined to be put into the CF according to the j (f, 3) value 1 、CF 2 Or CF (CF) 3
The maximum false positive rate of JF isAs shown in fig. 10, taking m=8 and b=4 as an example, the new jf is obtained by changing CF to NewCF, and the maximum false positive rate of NewJF is +.> The false positive rate of JF is +.f. of NewJF>Multiple times. At the same false positive rateNewJF is less +.>The bit memory space, s, is the number of CF's in JF.
Therefore, it can be seen that the design method in this embodiment can be directly applied to a dynamic cuckoo filter, and has stronger compatibility.
In order to confirm the superiority of the cuckoo filter design method in this example, the following experimental verification was performed:
(1) Experimental verification for NewCF false positive rate and storage space
For distinction, the conventional cuckoo filter was designated as CF, and the cuckoo filter obtained by the design method of this embodiment was designated as NewCF, and the maximum false positive rate of the conventional cuckoo filter wasMultiple times. Furthermore, under the same false positive rate +.>The memory space of CF is more +.>Bits. The maximum false positive rate is a theoretical value calculated by mathematical analysis, which gives an upper limit for the false positive rate in the experiment. Table 1 shows the average of five experimental false positive rates for CF and NewCF at different fingerprint lengths and filter sizes during the search.
Table 15 table of average false positive rate
As can be seen from table 1, the false positive rate of NewCF is lower for the same fingerprint length. The false positive rate of the newCF when the fingerprint length is 8 bits is smaller than the false positive rate of the cuckoo filter when the fingerprint length is 16 bits, namely the false positive rate of the newCF is lower under the condition that the fingerprint is shorter, and the storage space occupied by the newCF under the same false positive rate can be inferred to be smaller. Notably, newCF has a false positive rate of 0 when the fingerprint length is not less than 8 bits.
Since hash collision is a probability problem, five experiments are insufficient to indicate that the false positive rate is 0 when the fingerprint length is not less than 8 bits. Thus, 1000 independent experiments were performed in this example, and the number of times the false positive rate was 0 was tested. As shown in table 2, in 1000 independent repeated experiments, the number of times of false positive rate 0 is not less than 657 times when the fingerprint length is 8 bits, the number of times of false positive rate 0 is not less than 973 times when the fingerprint length is 12 bits, and the number of times of false positive rate 0 is 1000 times when the fingerprint length is 16 bits, under different filter sizes, it can be considered that there is no erroneous judgment phenomenon when the fingerprint length is 16 bits. In addition, the number of times the CF has a false positive rate of 0 when the fingerprint length is 16 bits is much smaller than the number of times the NewCF has a false positive rate of 0 when the fingerprint length is 8 bits.
TABLE 2 number of experiments with false positive rate of 0 in 1000 experiments
(2) Experimental verification for NewCF load
During the insertion of element x, newCF first selects fingerprint f x Deposit into candidate bucketIn the case of candidate bucket->When the fingerprint is full, the fingerprint is stored in the candidate barrel>Is a kind of medium. If candidate barrel->Is full, randomly select->Middle fingerprint f z Performing eviction, wherein the evicted fingerprint is stored in the candidate barrel corresponding to the evicted fingerprint>In the middle, if->Still full state, then randomly select +.>The middle fingerprint evicts it to F 2 Is a kind of medium. Under different filter sizes and fingerprint lengths, F in ten experiments 1 And F 2 The average value of the load factors is shown in Table 3, and F is obtained according to experiments 1 Load is greater than F 2 Load, and F 1 The load can reach 1.
Table 3F 1 And F 2 Load factor
The average load factor of CF and NewCF over ten experiments is given in table 4, and it is found from the experiments that the NewCF load is about the same as the CF load without reducing the load of the cuckoo filter.
Table 4 comparison of cuckoo filters with our design load conditions
Embodiment two:
the second embodiment of the invention provides a design system for reducing false positive rate of a cuckoo filter, which comprises:
a filter construction module configured to construct a cuckoo filter composed of a plurality of barrels, each barrel being composed of a plurality of slots in which fingerprints of elements are stored;
a filter splitting module configured to uniformly divide the cuckoo filter into an upper half and a lower half, the upper half filter being referred to as F 1 The lower half of the filter is called F 2
A candidate bucket setting module configured to set p 1 And p 2 Two candidate buckets; the candidate barrel p corresponding to any element is obtained by modulo and exclusive-or the number of barrels in each part of filter 1 All fall under F 1 In the corresponding candidate bucket p 2 All fall under F 2 In (a) and (b);
an element searching module configured to search the element to be searched by using a cuckoo filter to find a candidate bucket p 1 Or p 2 The fingerprints in the method are the same as the fingerprints of the elements to be searched, and the searching is successful.
In the filter construction module, the concrete steps of element insertion in the cuckoo filter tank are as follows: the element x is first selected to correspond to the fingerprint f x Deposit into filter F 1 Candidate barrelIn the case of candidate bucket->When the fingerprint is full, the fingerprint is stored in the candidate barrel>In (a) and (b); if candidate barrel->Is full, randomly select->Middle fingerprint f z Performing eviction, wherein the evicted fingerprint is stored in the candidate barrel corresponding to the evicted fingerprint>In the middle, if->Still full state, then randomly select +.>The medium fingerprint evicts it to filter F 2 Is a kind of medium.
In the candidate barrel setting module, a candidate barrel p 2 Using exclusive or operation in the calculation process, the number of filter buckets per part needs to satisfy the relation M' =2 i-1 (i.gtoreq.2), M' represents the number of barrels per part of filter, i is an integer of 2 or more.
In the element searching module, the maximum false positive rate in the searching process is as follows:
wherein,,the maximum false positive rate of the cuckoo filter is represented, M represents the number of barrels, f represents fingerprints, and b represents the size of barrels.
The embodiment does not change the integral structure of the cuckoo filter, so the design method has strong compatibility, can be directly applied to the dynamic cuckoo filter, such as DCF, CLDCF and JF, and the obtained new dynamic filter has lower false positive rate and is originalAt the same false positive rate as the original dynamic filter, the new dynamic filter occupies less storage space.
The steps involved in the second embodiment correspond to those of the first embodiment of the method, and the detailed description of the second embodiment can be found in the related description section of the first embodiment. It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented by general-purpose computer means, alternatively they may be implemented by program code executable by computing means, whereby they may be stored in storage means for execution by computing means, or they may be made into individual integrated circuit modules separately, or a plurality of modules or steps in them may be made into a single integrated circuit module. The present invention is not limited to any specific combination of hardware and software.
While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it is intended to cover all modifications or variations within the scope of the invention as defined by the claims of the present invention.

Claims (10)

1. A cuckoo filter design method for reducing false positive rate, comprising the steps of:
constructing a cuckoo filter consisting of a plurality of barrels, wherein each barrel consists of a plurality of grooves, and fingerprints of elements are stored in the grooves;
uniformly dividing the cuckoo filter into an upper part and a lower part, wherein the upper part filter is called F 1 The lower half of the filter is called F 2
Setting p 1 And p 2 Two candidate buckets; the candidate barrel p corresponding to any element is obtained by modulo and exclusive-or the number of barrels in each part of filter 1 All fall under F 1 In the corresponding candidate bucket p 2 All fall under F 2 In (a) and (b);
searching the element to be searched by using a cuckoo filter to find a candidate barrel p 1 Or p 2 The fingerprints in the method are the same as the fingerprints of the elements to be searched, and the searching is successful.
2. The design method of a cuckoo filter for reducing false positive rate according to claim 1, wherein the candidate bucket p corresponding to any element is obtained by modulo and exclusive-or of the number of buckets in each part of the filter 1 All fall onF 1 In the corresponding candidate bucket p 2 All fall under F 2 The specific formula is as follows:
where f represents the fingerprint, f= fingerprint (x), fingerprint (·) is the fingerprint calculation function, p 1 、p 2 Respectively representing two candidate buckets, x represents an element, M represents the number of buckets, and hash represents a hash function.
3. The cuckoo filter design method for reducing false positive rate according to claim 1, wherein the maximum false positive rate in the search process:
wherein,,the maximum false positive rate of the cuckoo filter is represented, M represents the number of barrels, f represents fingerprints, and b represents the size of barrels.
4. The cuckoo filter design method for reducing false positive rate according to claim 1, wherein the specific steps of element insertion in the cuckoo filter tank are: the element x is first selected to correspond to the fingerprint f x Deposit into filter F 1 Candidate barrelIn the case of candidate bucket->When the fingerprint is full, the fingerprint is stored in the candidate barrel>In (a) and (b); if candidate barrel->Is full, randomly select->Middle fingerprint f z Performing eviction, wherein the evicted fingerprint is stored in the candidate barrel corresponding to the evicted fingerprint>In the middle, if->Still full state, then randomly select +.>The medium fingerprint evicts it to filter F 2 Is a kind of medium.
5. The cuckoo filter design method for reducing false positive rate according to claim 1, wherein the cuckoo filter is directly applied to a dynamic cuckoo filter by a filter replacement method.
6. A design system for reducing false positive rate of a cuckoo filter, comprising:
a filter construction module configured to construct a cuckoo filter composed of a plurality of barrels, each barrel being composed of a plurality of slots in which fingerprints of elements are stored;
a filter splitting module configured to uniformly divide the cuckoo filter into an upper half and a lower half, the upper half filter being referred to as F 1 Lower half of the filtrationThe device is called F 2
A candidate bucket setting module configured to set p 1 And p 2 Two candidate buckets; candidate bucket p corresponding to any element 1 All fall under F 1 In the corresponding candidate bucket p 2 All fall under F 2 In (a) and (b);
an element searching module configured to search the element to be searched by using a cuckoo filter to find a candidate bucket p 1 Or p 2 The fingerprints in the method are the same as the fingerprints of the elements to be searched, and the searching is successful.
7. The design system for reducing false positive rate of a cuckoo filter of claim 6, wherein candidate bucket p 2 Using exclusive or operation in the calculation process, the number of filter buckets per part needs to satisfy the relation M' =2 i-1 (i.gtoreq.2), M' represents the number of barrels per part of filter, i is an integer of 2 or more.
8. The design system for reducing the false positive rate of a cuckoo filter of claim 6, wherein the maximum false positive rate during the search process:
wherein,,the maximum false positive rate of the cuckoo filter is represented, M represents the number of barrels, f represents fingerprints, and b represents the size of barrels.
9. The design system for reducing the false positive rate of a cuckoo filter according to claim 6, wherein the specific steps of inserting elements into the cuckoo filter tank are: the element x is first selected to correspond to the fingerprint f x Deposit into filter F 1 Candidate barrelIn the case of candidate bucket->When the fingerprint is full, the fingerprint is stored in the candidate barrel>In (a) and (b); if candidate barrel->Is full, randomly select->Middle fingerprint f z Performing eviction, wherein the evicted fingerprint is stored in the candidate barrel corresponding to the evicted fingerprint>In the middle, if->Still full state, then randomly select +.>The medium fingerprint evicts it to filter F 2 Is a kind of medium.
10. The design system for reducing the false positive rate of a cuckoo filter of claim 6, wherein said cuckoo filter is directly applied to a dynamic cuckoo filter by filter replacement.
CN202310344013.XA 2023-03-29 2023-03-29 Design method and system for cuckoo filter for reducing false positive rate Active CN116467307B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310344013.XA CN116467307B (en) 2023-03-29 2023-03-29 Design method and system for cuckoo filter for reducing false positive rate

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310344013.XA CN116467307B (en) 2023-03-29 2023-03-29 Design method and system for cuckoo filter for reducing false positive rate

Publications (2)

Publication Number Publication Date
CN116467307A true CN116467307A (en) 2023-07-21
CN116467307B CN116467307B (en) 2024-02-23

Family

ID=87179919

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310344013.XA Active CN116467307B (en) 2023-03-29 2023-03-29 Design method and system for cuckoo filter for reducing false positive rate

Country Status (1)

Country Link
CN (1) CN116467307B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190266252A1 (en) * 2018-02-27 2019-08-29 Advanced Micro Devices, Inc. Cuckoo filters and cuckoo hash tables with biasing, compression, and decoupled logical sparsity
CN111552693A (en) * 2020-04-30 2020-08-18 南方科技大学 Tag cuckoo filter
CN112148928A (en) * 2020-09-18 2020-12-29 鹏城实验室 Cuckoo filter based on fingerprint family
US20210034674A1 (en) * 2019-08-02 2021-02-04 EMC IP Holding Company LLC Cuckoo tree with duplicate key support
CN113535706A (en) * 2021-08-03 2021-10-22 重庆赛渝深科技有限公司 Two-stage cuckoo filter and repeated data deleting method based on two-stage cuckoo filter
CN115510092A (en) * 2022-09-27 2022-12-23 青海师范大学 Approximate member query optimization method based on cuckoo filter

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190266252A1 (en) * 2018-02-27 2019-08-29 Advanced Micro Devices, Inc. Cuckoo filters and cuckoo hash tables with biasing, compression, and decoupled logical sparsity
US20210034674A1 (en) * 2019-08-02 2021-02-04 EMC IP Holding Company LLC Cuckoo tree with duplicate key support
CN111552693A (en) * 2020-04-30 2020-08-18 南方科技大学 Tag cuckoo filter
CN112148928A (en) * 2020-09-18 2020-12-29 鹏城实验室 Cuckoo filter based on fingerprint family
CN113535706A (en) * 2021-08-03 2021-10-22 重庆赛渝深科技有限公司 Two-stage cuckoo filter and repeated data deleting method based on two-stage cuckoo filter
CN115510092A (en) * 2022-09-27 2022-12-23 青海师范大学 Approximate member query optimization method based on cuckoo filter

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
彭建新;詹志辉;陈宗淦;王子佳;: "自适应步长和发现概率的布谷鸟搜索算法", 济南大学学报(自然科学版), no. 05, 6 June 2016 (2016-06-06), pages 11 - 16 *
王飞越: "基于负载均衡的高效布谷鸟过滤器研究", 中国优秀硕士学位论文全文数据库 信息科技辑, no. 03, 15 March 2020 (2020-03-15), pages 138 - 241 *

Also Published As

Publication number Publication date
CN116467307B (en) 2024-02-23

Similar Documents

Publication Publication Date Title
US10545865B2 (en) Systems and methods for implementing low-latency lookup circuits using sparse hash functions
US9223720B2 (en) Systems and methods for rapidly generating suitable pairs of hash functions
CN106874348B (en) File storage and index method and device and file reading method
CN109325032B (en) Index data storage and retrieval method, device and storage medium
CN110147204B (en) Metadata disk-dropping method, device and system and computer-readable storage medium
CN102880628A (en) Hash data storage method and device
CN108984130A (en) A kind of the caching read method and its device of distributed storage
CN113535706A (en) Two-stage cuckoo filter and repeated data deleting method based on two-stage cuckoo filter
CN110618974A (en) Data storage method, device, equipment and storage medium
CN111858651A (en) Data processing method and data processing device
CN106649146A (en) Memory release method and apparatus
CN111274245B (en) Method and device for optimizing data storage
CN111930924A (en) Data duplicate checking system and method based on bloom filter
CN115576899A (en) Index construction method and device and file searching method and device
CN116467307B (en) Design method and system for cuckoo filter for reducing false positive rate
Kopelowitz et al. Support optimality and adaptive cuckoo filters
Lee et al. Telescoping filter: A practical adaptive filter
CN102609509A (en) Method and device for processing hash data
Sun et al. MinCounter: An efficient cuckoo hashing scheme for cloud storage systems
CN116126928A (en) Information searching system based on variable fingerprint cuckoo filter
Jackson et al. Efficient external sorting for memory-constrained embedded devices with flash memory
CN109460394B (en) Simplification method of multi-level document entry tracking matrix
CN108647289B (en) Hash table building method based on valley Hash and bloom filter
Byun et al. An index rewriting scheme using compression for flash memory database systems
CN108241640B (en) Distributed file storage method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant