CN113886391A - Data processing method of double-fingerprint storage cuckoo filter based on discrete type - Google Patents

Data processing method of double-fingerprint storage cuckoo filter based on discrete type Download PDF

Info

Publication number
CN113886391A
CN113886391A CN202111181649.4A CN202111181649A CN113886391A CN 113886391 A CN113886391 A CN 113886391A CN 202111181649 A CN202111181649 A CN 202111181649A CN 113886391 A CN113886391 A CN 113886391A
Authority
CN
China
Prior art keywords
index table
fingerprint
storage unit
data
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111181649.4A
Other languages
Chinese (zh)
Other versions
CN113886391B (en
Inventor
邓显辉
李斌勇
赵兰
蒋娜
张小辉
宋学江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Tianma Technology Co ltd
Chengdu University of Information Technology
Original Assignee
Chengdu Tianma Technology Co ltd
Chengdu University of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Tianma Technology Co ltd, Chengdu University of Information Technology filed Critical Chengdu Tianma Technology Co ltd
Priority to CN202111181649.4A priority Critical patent/CN113886391B/en
Publication of CN113886391A publication Critical patent/CN113886391A/en
Application granted granted Critical
Publication of CN113886391B publication Critical patent/CN113886391B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data processing method of a dual-fingerprint storage cuckoo filter based on a discrete type, which comprises the steps of establishing a main index table of the dual-fingerprint storage cuckoo filter based on the discrete type and initializing the main index table; correspondingly inserting data, inquiring data and/or deleting data according to the current instruction type; and judging whether to continuously acquire the current instruction type, if so, continuously processing the data, and otherwise, finishing the data processing of the discrete double-fingerprint storage cuckoo filter. The dynamic expansion of the storage space of the cuckoo filter is realized by combining the dynamic change of the storage space and the dynamic increase and decrease of the stored data, and the construction speed of a data structure is improved; the storage space is saved, the member query accuracy is improved, and the member false deletion probability is reduced; the problem of cyclic loading of data when relocation operation occurs is effectively avoided, and the use efficiency of the cuckoo filter is improved.

Description

Data processing method of double-fingerprint storage cuckoo filter based on discrete type
Technical Field
The invention relates to the field of computer information representation and information retrieval, in particular to a data processing method of a dual-fingerprint storage cuckoo filter based on a discrete type.
Background
The three problems are three common data processing problems, especially the three problems are processed under the conditions of meeting several key requirements of low storage space overhead, quick query and the like, and the realization of the problems becomes a huge challenge. Currently, researchers commonly use Bloom filters (Bloom filters), Bloom filters and variants thereof, Cuckoo filters (Cuckoo filters), variants of the Bloom filters, and the like to solve the above problems, but the Bloom filters (Bloom filters) including variants thereof cannot adapt to and solve the above problems. For example, a standard bloom filter does not support delete operations for set members, and a count bloom filter, while supporting member deletes, results in a dramatic increase in space overhead. For a Cuckoo Filter (Cuckoo Filter), it is a Filter implemented based on Cuckoo hash algorithm, and is essentially a Cuckoo hash table storing hash values of storage items. The method overcomes the defect that the bloom filter does not support member deletion, and simultaneously remarkably reduces the problem of high storage overhead of bloom filter varieties. However, the standard cuckoo filter cannot adapt to high-speed dynamic data transformation, and is easy to have the problems of filling circulation, mistaken deletion caused by Hash collision when deleting data and the like. In the prior art, cuckoo filtering varieties have the problems that self-adaptive dynamic high-speed data transformation cannot be realized, and the storage space cannot be changed after the generation. Meanwhile, when the member deletion is faced, the error deletion rate is continuously increased along with the increase of the number of the members. In the face of the problem of easy occurrence of filling cycles, the existing solutions also rarely have a good solution.
Disclosure of Invention
Aiming at the defects in the prior art, the data processing method based on the discrete double-fingerprint storage cuckoo filter solves the problems that the prior art cannot adapt to dynamic high-speed data transformation, reliable data insertion and high data cyclic filling probability at the same time.
In order to achieve the purpose of the invention, the invention adopts the technical scheme that:
the data processing method based on the discrete double-fingerprint storage cuckoo filter is provided, and comprises the following steps of:
s1, establishing a discrete double-fingerprint-based storage cuckoo filter main index table and initializing the table;
s2, acquiring the current instruction type, and if the instruction type is data insertion, entering the step S3; if the instruction type is data query, performing data query; if the instruction type is data deletion, data deletion is carried out;
s3, calculating the fingerprint information of the data member to be inserted, selecting the front k position of the fingerprint information as a front fingerprint, and selecting the rear k-1 position of the fingerprint information as a rear fingerprint;
s4, judging whether the fingerprint information of the member to be inserted is in the double-fingerprint storage cuckoo filter, if so, ending the data insertion and entering the step S13; otherwise, go to step S5;
s5, calculating the positions of two candidate buckets of the data members to be inserted, and selecting one of the candidate buckets as a candidate bucket to be inserted;
s6, judging whether the address stored in the main index pointed by the main index pointer of the candidate bucket to be inserted corresponding to the data member to be inserted is empty, if yes, entering the step S7; otherwise, go to step S8;
s7, generating a main index storage unit, setting the relocation identifier value to 0, filling the pre-fingerprints and the post-fingerprints selected in the step S3 into the main index storage unit, pointing the main index pointer of the corresponding candidate bucket to the main index storage unit, ending the data insertion, and entering the step S13;
s8, judging whether the address stored by the main index of the candidate bucket to be inserted corresponding to the data member to be inserted is a main index storage unit, if so, entering the step S9, otherwise, entering the step S10;
s9, reserving the main index storage unit, generating a primary index table, setting the position address of No. 0 as NULL, and pointing the main index pointer of the corresponding candidate bucket to the primary index table; generating two secondary index storage units of a primary index table, respectively pointing address storage positions of the two secondary index storage units of the data to be inserted in the primary index table to the generated two secondary index storage units, simultaneously setting relocation identification values of the two secondary index storage units to be 0, sequentially inserting a front fingerprint of a data member to be inserted and a front fingerprint in a reserved main index storage unit into a front fingerprint position of the primary index table, sequentially inserting a rear fingerprint of the data member to be inserted and a rear fingerprint in the reserved main index storage unit into the two generated secondary index storage units of the primary index table, canceling the reserved main index storage unit after the insertion is finished, and simultaneously counting the number of the remaining addresses of the primary index table to be-2; ending this data insertion and proceeding to step S13;
s10, judging whether the residual address number of the current index table is 0, if yes, entering the step S12; otherwise, go to step S11;
s11, generating a secondary index storage unit of the current index table, setting the relocation identification value to 0, pointing the address storage position of the secondary index storage unit to be inserted with data in the current index table to the generated secondary index storage unit, sequentially inserting the front fingerprints of the data members to be inserted into the front fingerprint positions of the current index table, sequentially inserting the rear fingerprints of the data members to be inserted into the secondary index storage unit generated by the current index table, counting the residual addresses of the index table to 1, ending the data insertion and entering the step S13;
s12, judging whether a next-level index table exists, if so, entering the step S10, otherwise, performing repositioning operation and entering the step S13;
and S13, judging whether to continue to acquire the current instruction type, if so, returning to the step S2, and if not, finishing the fingerprint filtering of the cuckoo filter.
Further: when the step S1 is initialized, the whole cuckoo filter system only has one index table with empty address bits, i.e., a main index table; the minimum unit of storage pointed by the address of the main index table is a main index storage unit which consists of a front fingerprint position, a rear fingerprint storage position and a repositioning identification value; the occupation of the two storage bits is the same, the relocation identification value occupies 1 bit and has three states of 0, 1 and n; where 0 represents a newly inserted fingerprint, 1 represents that 1 relocation operation has occurred for this memory location, and n represents that n relocation operations have occurred for the memory location.
Further: the first-level index table consists of a residual address number storage bit, a No. 0 address bit and a secondary index table storage bit; the storage bit of the residual address number stores the residual number of the storage bits of the secondary index table of the index table, the storage bit is initialized to n and does not include the address bit No. 0; the storage positions of the secondary index table comprise a front fingerprint position of the secondary index table and address storage positions of a secondary index storage unit, the address storage positions of the secondary index storage unit point to the secondary index storage unit correspondingly generated by the secondary index table, the secondary index storage unit sequentially stores the secondary index storage unit from n bits according to a reverse principle, the remaining address number of the index table is 0 until 1 bit is used, namely the index table is fully loaded, the next-level index table is inserted when the index table is fully loaded, and the structure of the next-level index table is the same as that of the first-level index table; address bit number 0 is initialized to NULL and when there is a next level index table, this bit stores the address of the next level index table.
Further: k of step S3 is less than or equal to xi calculated by hash function and used for inserting data member X1/2 of the total number of digits of the fingerprint information.
Further: xi member to be inserted in step S4XThe two candidate barrel positions x and gamma are x ═ h respectively1X) And
Figure BDA0003297525280000041
wherein,
Figure BDA0003297525280000042
is an exclusive or operation; f. ofXFingerprint information of a data member to be inserted; h is1(. h) is a hash function; a data member has two candidate buckets, and data information of the two candidate buckets is stored in only one of the candidate buckets.
Further, the specific method of the relocation operation in step S12 includes the following sub-steps:
s12-1, generating an intermediate variable temp in the form of a main index storage unit, randomly filling all fingerprint information of an existing data member in the original primary index table into the intermediate variable temp, and after filling is finished, emptying the storage position information of the secondary index table of the data member and canceling the original secondary index storage unit;
s12-2, generating a new secondary index storage unit in the primary index table, setting the relocation identification value to 0, and pointing the address storage position of the empty secondary index storage unit in the primary index table to the new secondary index storage unit in the primary index table; sequentially inserting data members xiXThe pre-fingerprints are inserted into empty pre-fingerprint positions in a primary index table, and data members xi to be inserted are sequentially insertedXThe post fingerprint is inserted into a new secondary index storage unit of the primary index table;
s12-3, judging whether the repositioning identification value of the intermediate variable temp is 0, if yes, entering the step S12-4; otherwise, entering step S12-11;
s12-4, calculating the position of another candidate bucket of the intermediate variable temp, judging whether the address stored by the main index pointed by the main index pointer of the candidate bucket is empty, if yes, entering the step S12-5; otherwise, entering step S12-6;
s12-5, setting the relocation identification value of the intermediate variable temp to 1, simultaneously pointing the main index pointer of the candidate bucket to the intermediate variable temp, ending the relocation operation and entering the step S14;
s12-6, judging whether the main index pointer of another candidate bucket of the intermediate variable temp points to a main index storage unit, if so, entering the step S12-7; otherwise, entering step S12-9;
s12-7, reserving the main index storage unit, generating a primary index table, setting the position address of 0 as NULL, and pointing the main index pointer of the corresponding candidate bucket to the primary index table;
s12-8, generating two secondary index storage units of the primary index table, one of which sets the relocation identification value to 0 and the other of which sets the relocation identification value to 1; inserting the front fingerprints of the intermediate variable temp and the front fingerprints of the reserved main index storage unit into front fingerprint positions in the secondary index table storage positions of the primary index table to-be-inserted data in sequence, pointing the address storage positions of the secondary index storage units in the secondary index table storage positions of the front fingerprints of the intermediate variable temp to the secondary index storage units with the relocation identifier value of 1, and pointing the address storage positions of the secondary index storage units in the secondary index table storage positions of the front fingerprints of the reserved main index storage units to the secondary index storage units with the relocation identifier value of 0; filling the post-fingerprint of the intermediate variable temp into a secondary index storage unit of a primary index table with a relocation identification value of 1, and inserting the post-fingerprint of a reserved main index storage unit into a secondary index storage unit of the primary index table with a relocation identification value of 0; after the insertion is finished, the intermediate variable temp and the reserved main index storage unit are cancelled, the residual address number-2 of the primary index table is counted, the relocation operation is finished, and the step S14 is entered;
s12-9, judging whether the residual address number of the index table currently pointed by the main index pointer of another candidate bucket of the intermediate variable temp is 0, if yes, entering step S12-1, otherwise, entering step S12-10;
s12-10, generating a secondary index storage unit of the current index table and setting the relocation identification value to 1; inserting the leading fingerprint of the intermediate variable temp into the leading fingerprint position of the secondary index table storage position of the current index table, simultaneously pointing the secondary index storage unit address storage position in the secondary index table storage position of the current index table to the generated secondary index storage unit, inserting the trailing fingerprint of the intermediate variable temp into the secondary index storage unit generated by the current index table, logging out the intermediate variable temp from the residual address number-1 of the current index table, ending the repositioning operation and entering the step S14;
s12-11, searching the current last-level index table of the current candidate bucket of the intermediate variable temp, judging whether the residual address number of the index table is 0, if so, entering the step S12-12, otherwise, entering the step S12-14;
s12-12, generating a next-level index table, setting the address position 0 as NULL, and pointing the address position 0 of the current last-level index table to the next-level index table;
s12-13, generating a secondary index storage unit of the next-level index table, filling the post-fingerprint information of the intermediate variable temp and the value after the repositioning identification value +1 into the secondary index storage unit, sequentially inserting the pre-fingerprint information of the intermediate variable temp into the pre-fingerprint position in the storage position of the secondary index table, and simultaneously pointing the address storage position of the secondary index storage unit in the storage position of the secondary index table to the generated secondary index storage unit; after the insertion is finished, the intermediate variable temp is cancelled, the residual address number-1 of the next-level index table is counted, the relocation operation is finished, and the step S14 is entered;
s12-14, generating a secondary index storage unit of the current last-stage index table, filling the post-fingerprint information of the intermediate variable temp and the value after the repositioning identification value +1 into the secondary index storage unit, sequentially inserting the pre-fingerprint information of the intermediate variable temp into the storage positions of the secondary index table, and simultaneously pointing the address storage positions of the secondary index storage unit of the storage positions of the secondary index table to the generated secondary index storage unit; after the insertion is finished, the intermediate variable temp is cancelled, the remaining address number-1 of the next-level index table is counted, the relocation operation is finished, and the process proceeds to step S14.
Further, the specific method for querying data in step S2 includes the following steps:
s2-1-1, calculating to obtain the number to be inquiredAccording to the member xiYFingerprint information fYAnd obtaining the pre-fingerprints of the members
Figure BDA0003297525280000071
And post fingerprint
Figure BDA0003297525280000072
S2-1-2, calculating member xi of data to be inquiredYTwo candidate bucket positions χ 'and γ';
s2-1-3, judging the pre-fingerprint
Figure BDA0003297525280000073
If yes, the step S2-1-4 is executed; otherwise, ending the data query and proceeding to step S14;
s2-1-4, judging the pre-fingerprint
Figure BDA0003297525280000074
Whether the post-fingerprint of the candidate bucket is the same as the post-fingerprint
Figure BDA0003297525280000075
If so, judging that the query is successful, ending the data query and entering the step S14; otherwise, the query is determined to be failed, the data query is ended, and the process proceeds to step S14.
Further: data member xi to be inquired in step S2-1-1YThe two candidate bucket positions χ ' and γ ' are χ ' h1Y) And
Figure BDA0003297525280000076
wherein,
Figure BDA0003297525280000077
is an exclusive or operation; h is1(. cndot.) is a hash function.
Further, the specific method for deleting data in step S2 includes the following steps:
s2-2-1, calculating xi of the member to be deleteddelIs referred to asFingerprint information and obtaining a pre-fingerprint of the member
Figure BDA0003297525280000078
And post fingerprint
Figure BDA0003297525280000079
S2-2-2, calculating the xi of the member to be deleteddelTwo candidate bucket positions χdelAnd gammadel
S2-2-3, judging the xi of the member to be deleteddelWhether or not there is a candidate bucket χdelOr gammadelIf yes, go to step S2-2-4; otherwise, ending the data deletion and proceeding to step S14;
s2-2-4, judging the xi of the member to be deleteddelIf the storage unit is the main index storage unit, the step S2-2-5 is executed if the storage unit is the main index storage unit; otherwise, the step S2-2-6 is entered;
s2-2-5, logging out the main index storage unit, and enabling the address position of the candidate bucket of the main index table to be NULL; ending the data deletion and proceeding to step S14;
s2-2-6, logging out the secondary index storage unit, and setting the values of the storage positions of the secondary index table in the current index table to be NULL;
s2-2-7, judging whether the number of the remaining addresses of the current index table is n-1, namely no load, if yes, entering the step S2-2-8; otherwise, the step S2-2-9 is entered;
s2-2-8, carrying out NULL on the address position 0 of the upper-level index table of the table, canceling the current index table, finishing the data deletion and entering the step S14;
s2-2-9, searching a last-stage index table, migrating the fingerprint data in the last-stage index table to the current index table, and counting the residual addresses of the last-stage index table by + 1;
s2-2-10, taking the last-level index table as the current index table, judging whether the residual address number of the last-level index table is n, namely no load, if yes, entering the step S2-2-8; otherwise, this data deletion is ended and the process proceeds to step S14.
Further: in step S2-2-1Member xi of data to be inquireddelTwo candidate bucket positions χdelAnd gammadelAre respectively xdel=h1del) And
Figure BDA0003297525280000081
wherein,
Figure BDA0003297525280000082
is an exclusive or operation; h is1(. cndot.) is a hash function.
The invention has the beneficial effects that: by combining the two characteristics of dynamic storage space conversion and dynamic increase and decrease of stored data, the special characteristics of an index table and a pointer can be utilized to realize dynamic expansion and contraction of the storage space of the cuckoo filter, various dynamically converted data streams are self-adapted, and the construction speed of a data structure is further improved; by executing the member inserting, member inquiring and member deleting methods in the scheme, the problem that a large amount of storage space is wasted in the existing scheme is effectively solved, the accuracy of member inquiring is improved through double-fingerprint storage, and the probability of member mistaken deleting is effectively reduced; the high-precision member query and reliable member deletion are provided, meanwhile, the high-efficiency utilization rate of the storage space is realized by using a data compact technology, and the waste of the space is avoided; by adding the bit of the 'relocation identification value' into the storage unit, the problem of cyclic loading of data when relocation operation occurs is effectively avoided, and the use efficiency of the cuckoo filter is improved.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is an index representation of the present invention;
FIG. 3 is a diagram of minimum storage units of the main index table according to the present invention;
FIG. 4 is a diagram of minimum storage units of the secondary index table according to the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
As shown in fig. 1, the data processing method based on the discrete double-fingerprint storage cuckoo filter includes the following steps:
s1, establishing a discrete double-fingerprint-based storage cuckoo filter main index table and initializing the table;
s2, acquiring the current instruction type, and if the instruction type is data insertion, entering the step S3; if the instruction type is data query, performing data query; if the instruction type is data deletion, data deletion is carried out;
s3, calculating the fingerprint information of the data member to be inserted, selecting the front k position of the fingerprint information as a front fingerprint, and selecting the rear k-1 position of the fingerprint information as a rear fingerprint;
s4, judging whether the fingerprint information of the member to be inserted is in the double-fingerprint storage cuckoo filter, if so, ending the data insertion and entering the step S13; otherwise, go to step S5;
s5, calculating the positions of two candidate buckets of the data members to be inserted, and selecting one of the candidate buckets as a candidate bucket to be inserted;
s6, judging whether the address stored in the main index pointed by the main index pointer of the candidate bucket to be inserted corresponding to the data member to be inserted is empty, if yes, entering the step S7; otherwise, go to step S8;
s7, generating a main index storage unit, setting the relocation identifier value to 0, filling the pre-fingerprints and the post-fingerprints selected in the step S3 into the main index storage unit, pointing the main index pointer of the corresponding candidate bucket to the main index storage unit, ending the data insertion, and entering the step S13;
s8, judging whether the address stored by the main index of the candidate bucket to be inserted corresponding to the data member to be inserted is a main index storage unit, if so, entering the step S9, otherwise, entering the step S10;
s9, reserving the main index storage unit, generating a primary index table, setting the position address of No. 0 as NULL, and pointing the main index pointer of the corresponding candidate bucket to the primary index table; generating two secondary index storage units of a primary index table, respectively pointing address storage positions of the two secondary index storage units of the data to be inserted in the primary index table to the generated two secondary index storage units, simultaneously setting relocation identification values of the two secondary index storage units to be 0, sequentially inserting a front fingerprint of a data member to be inserted and a front fingerprint in a reserved main index storage unit into a front fingerprint position of the primary index table, sequentially inserting a rear fingerprint of the data member to be inserted and a rear fingerprint in the reserved main index storage unit into the two generated secondary index storage units of the primary index table, canceling the reserved main index storage unit after the insertion is finished, and simultaneously counting the number of the remaining addresses of the primary index table to be-2; ending this data insertion and proceeding to step S13;
s10, judging whether the residual address number of the current index table is 0, if yes, entering the step S12; otherwise, go to step S11;
s11, generating a secondary index storage unit of the current index table, setting the relocation identification value to 0, pointing the address storage position of the secondary index storage unit to be inserted with data in the current index table to the generated secondary index storage unit, sequentially inserting the front fingerprints of the data members to be inserted into the front fingerprint positions of the current index table, sequentially inserting the rear fingerprints of the data members to be inserted into the secondary index storage unit generated by the current index table, counting the residual addresses of the index table to 1, ending the data insertion and entering the step S13;
s12, judging whether a next-level index table exists, if so, entering the step S10, otherwise, performing repositioning operation and entering the step S13;
and S13, judging whether to continue to acquire the current instruction type, if so, returning to the step S2, and if not, finishing the fingerprint filtering of the cuckoo filter.
When the step S1 is initialized, the whole cuckoo filter system only has one index table with empty address bits, i.e., a main index table; the minimum unit of storage pointed by the address of the main index table is a main index storage unit which consists of a front fingerprint position, a rear fingerprint storage position and a repositioning identification value; the occupation of the two storage bits is the same, the relocation identification value occupies 1 bit and has three states of 0, 1 and n; where 0 represents a newly inserted fingerprint, 1 represents that 1 relocation operation has occurred for this memory location, and n represents that n relocation operations have occurred for the memory location.
The first-level index table consists of a residual address number storage bit, a No. 0 address bit and a secondary index table storage bit; the storage bit of the residual address number stores the residual number of the storage bits of the secondary index table of the index table, the storage bit is initialized to n and does not include the address bit No. 0; the storage positions of the secondary index table comprise a front fingerprint position of the secondary index table and address storage positions of a secondary index storage unit, the address storage positions of the secondary index storage unit point to the secondary index storage unit correspondingly generated by the secondary index table, the secondary index storage unit sequentially stores the secondary index storage unit from n bits according to a reverse principle, the remaining address number of the index table is 0 until 1 bit is used, namely the index table is fully loaded, the next-level index table is inserted when the index table is fully loaded, and the structure of the next-level index table is the same as that of the first-level index table; address bit number 0 is initialized to NULL and when there is a next level index table, this bit stores the address of the next level index table.
K of step S3 is less than or equal to xi calculated by hash function and used for inserting data member X1/2 of the total number of digits of the fingerprint information.
Xi member to be inserted in step S4XThe two candidate barrel positions x and gamma are x ═ h respectively1X) And
Figure BDA0003297525280000111
wherein,
Figure BDA0003297525280000112
is an exclusive or operation; f. ofXFingerprint information of a data member to be inserted; h is1(. h) is a hash function; a data member has two candidate buckets, and data information of the two candidate buckets is stored in only one of the candidate buckets.
The specific method of the relocation operation in step S12 includes the following sub-steps:
s12-1, generating an intermediate variable temp in the form of a main index storage unit, randomly filling all fingerprint information of an existing data member in the original primary index table into the intermediate variable temp, and after filling is finished, emptying the storage position information of the secondary index table of the data member and canceling the original secondary index storage unit;
s12-2, generating a new secondary index storage unit in the primary index table, setting the relocation identification value to 0, and pointing the address storage position of the empty secondary index storage unit in the primary index table to the new secondary index storage unit in the primary index table; sequentially inserting data members xiXThe pre-fingerprints are inserted into empty pre-fingerprint positions in a primary index table, and data members xi to be inserted are sequentially insertedXThe post fingerprint is inserted into a new secondary index storage unit of the primary index table;
s12-3, judging whether the repositioning identification value of the intermediate variable temp is 0, if yes, entering the step S12-4; otherwise, entering step S12-11;
s12-4, calculating the position of another candidate bucket of the intermediate variable temp, judging whether the address stored by the main index pointed by the main index pointer of the candidate bucket is empty, if yes, entering the step S12-5; otherwise, entering step S12-6;
s12-5, setting the relocation identification value of the intermediate variable temp to 1, simultaneously pointing the main index pointer of the candidate bucket to the intermediate variable temp, ending the relocation operation and entering the step S14;
s12-6, judging whether the main index pointer of another candidate bucket of the intermediate variable temp points to a main index storage unit, if so, entering the step S12-7; otherwise, entering step S12-9;
s12-7, reserving the main index storage unit, generating a primary index table, setting the position address of 0 as NULL, and pointing the main index pointer of the corresponding candidate bucket to the primary index table;
s12-8, generating two secondary index storage units of the primary index table, one of which sets the relocation identification value to 0 and the other of which sets the relocation identification value to 1; inserting the front fingerprints of the intermediate variable temp and the front fingerprints of the reserved main index storage unit into front fingerprint positions in the secondary index table storage positions of the primary index table to-be-inserted data in sequence, pointing the address storage positions of the secondary index storage units in the secondary index table storage positions of the front fingerprints of the intermediate variable temp to the secondary index storage units with the relocation identifier value of 1, and pointing the address storage positions of the secondary index storage units in the secondary index table storage positions of the front fingerprints of the reserved main index storage units to the secondary index storage units with the relocation identifier value of 0; filling the post-fingerprint of the intermediate variable temp into a secondary index storage unit of a primary index table with a relocation identification value of 1, and inserting the post-fingerprint of a reserved main index storage unit into a secondary index storage unit of the primary index table with a relocation identification value of 0; after the insertion is finished, the intermediate variable temp and the reserved main index storage unit are cancelled, the residual address number-2 of the primary index table is counted, the relocation operation is finished, and the step S14 is entered;
s12-9, judging whether the residual address number of the index table currently pointed by the main index pointer of another candidate bucket of the intermediate variable temp is 0, if yes, entering step S12-1, otherwise, entering step S12-10;
s12-10, generating a secondary index storage unit of the current index table and setting the relocation identification value to 1; inserting the leading fingerprint of the intermediate variable temp into the leading fingerprint position of the secondary index table storage position of the current index table, simultaneously pointing the secondary index storage unit address storage position in the secondary index table storage position of the current index table to the generated secondary index storage unit, inserting the trailing fingerprint of the intermediate variable temp into the secondary index storage unit generated by the current index table, logging out the intermediate variable temp from the residual address number-1 of the current index table, ending the repositioning operation and entering the step S14;
s12-11, searching the current last-level index table of the current candidate bucket of the intermediate variable temp, judging whether the residual address number of the index table is 0, if so, entering the step S12-12, otherwise, entering the step S12-14;
s12-12, generating a next-level index table, setting the address position 0 as NULL, and pointing the address position 0 of the current last-level index table to the next-level index table;
s12-13, generating a secondary index storage unit of the next-level index table, filling the post-fingerprint information of the intermediate variable temp and the value after the repositioning identification value +1 into the secondary index storage unit, sequentially inserting the pre-fingerprint information of the intermediate variable temp into the pre-fingerprint position in the storage position of the secondary index table, and simultaneously pointing the address storage position of the secondary index storage unit in the storage position of the secondary index table to the generated secondary index storage unit; after the insertion is finished, the intermediate variable temp is cancelled, the residual address number-1 of the next-level index table is counted, the relocation operation is finished, and the step S14 is entered;
s12-14, generating a secondary index storage unit of the current last-stage index table, filling the post-fingerprint information of the intermediate variable temp and the value after the repositioning identification value +1 into the secondary index storage unit, sequentially inserting the pre-fingerprint information of the intermediate variable temp into the storage positions of the secondary index table, and simultaneously pointing the address storage positions of the secondary index storage unit of the storage positions of the secondary index table to the generated secondary index storage unit; after the insertion is finished, the intermediate variable temp is cancelled, the remaining address number-1 of the next-level index table is counted, the relocation operation is finished, and the process proceeds to step S14.
The specific method for querying data in step S2 includes the following steps:
s2-1-1, calculating to obtain member xi of data to be inquiredYFingerprint information fYAnd obtaining the pre-fingerprints of the members
Figure BDA0003297525280000141
And post fingerprint
Figure BDA0003297525280000142
S2-1-2, calculating member xi of data to be inquiredYTwo candidate bucket positions χ 'and γ';
s2-1-3, judging the pre-fingerprint
Figure BDA0003297525280000143
If yes, the step S2-1-4 is executed; otherwise, ending the data query and proceeding to step S14;
s2-1-4, judging the pre-fingerprint
Figure BDA0003297525280000144
Whether the post-fingerprint of the candidate bucket is the same as the post-fingerprint
Figure BDA0003297525280000145
If so, judging that the query is successful, ending the data query and entering the step S14; otherwise, the query is determined to be failed, the data query is ended, and the process proceeds to step S14.
Data member xi to be inquired in step S2-1-1YThe two candidate bucket positions χ ' and γ ' are χ ' h1Y) And
Figure BDA0003297525280000146
wherein,
Figure BDA0003297525280000147
is an exclusive or operation; h is1(. cndot.) is a hash function.
The specific method for deleting data in step S2 includes the following steps:
s2-2-1, calculating xi of the member to be deleteddelAnd obtaining a pre-fingerprint of the member
Figure BDA0003297525280000151
And post fingerprint
Figure BDA0003297525280000152
S2-2-2, calculating the xi of the member to be deleteddelTwo candidate bucket positions χdelAnd gammadel
S2-2-3, judging the xi of the member to be deleteddelWhether or not there is a candidate bucket χdelOr gammadelIf yes, go to step S2-2-4; otherwise, the data deletion is finished and the process is finishedGo to step S14;
s2-2-4, judging the xi of the member to be deleteddelIf the storage unit is the main index storage unit, the step S2-2-5 is executed if the storage unit is the main index storage unit; otherwise, the step S2-2-6 is entered;
s2-2-5, logging out the main index storage unit, and enabling the address position of the candidate bucket of the main index table to be NULL; ending the data deletion and proceeding to step S14;
s2-2-6, logging out the secondary index storage unit, and setting the values of the storage positions of the secondary index table in the current index table to be NULL;
s2-2-7, judging whether the number of the remaining addresses of the current index table is n-1, namely no load, if yes, entering the step S2-2-8; otherwise, the step S2-2-9 is entered;
s2-2-8, carrying out NULL on the address position 0 of the upper-level index table of the table, canceling the current index table, finishing the data deletion and entering the step S14;
s2-2-9, searching a last-stage index table, migrating the fingerprint data in the last-stage index table to the current index table, and counting the residual addresses of the last-stage index table by + 1;
s2-2-10, taking the last-level index table as the current index table, judging whether the residual address number of the last-level index table is n, namely no load, if yes, entering the step S2-2-8; otherwise, this data deletion is ended and the process proceeds to step S14.
Data member xi to be inquired in step S2-2-1delTwo candidate bucket positions χdelAnd gammadelAre respectively xdel=h1del) And
Figure BDA0003297525280000153
wherein,
Figure BDA0003297525280000154
is an exclusive or operation; h is1(. cndot.) is a hash function.
As shown in FIG. 2, the discrete dual-fingerprint storage cuckoo filter of the present invention comprises a primary index table and S secondary index tables, wherein the primary index table points to the primary index table or the primary index storageThe unit, the secondary index table points to the storage unit of the secondary index, when the secondary index table is full load, point to the next level index table at the same time; in which ξXIs a data member, h1(x) And h2(x) For two different candidate bucket computations that are to be performed,
Figure BDA0003297525280000161
is a data member xiXThe pre-set fingerprint information of (a),
Figure BDA0003297525280000162
is a data member xiXThe subscript X corresponds to overwriting with a different lower case letter to indicate a different member.
As shown in fig. 3, the smallest storage unit of the main index table of the discrete dual-fingerprint storage cuckoo filter, i.e., the main index storage unit, of the present invention includes a pre-fingerprint bit, a post-fingerprint bit, and a relocation identification value.
As shown in fig. 4, the smallest storage unit of the secondary index table of the discrete dual-fingerprint storage cuckoo filter of the present invention, i.e., the secondary index storage unit, includes a post-fingerprint bit and a relocation identification value.
The method combines the two characteristics of dynamic storage space conversion and dynamic increase and decrease of the stored data, can utilize the special characteristics of the index table and the pointer to realize the dynamic expansion of the storage space of the cuckoo filter, is self-adaptive to various dynamically converted data streams, and further improves the construction speed of a data structure; by executing the member inserting, member inquiring and member deleting methods in the scheme, the problem that a large amount of storage space is wasted in the existing scheme is effectively solved, the accuracy of member inquiring is improved through double-fingerprint storage, and the probability of member mistaken deleting is effectively reduced; the high-precision member query and reliable member deletion are provided, meanwhile, the high-efficiency utilization rate of the storage space is realized by using a data compact technology, and the waste of the space is avoided; by adding the bit of the 'relocation identification value' into the storage unit, the problem of cyclic loading of data when relocation operation occurs is effectively avoided, and the use efficiency of the cuckoo filter is improved.

Claims (10)

1. A data processing method based on a discrete double-fingerprint storage cuckoo filter is characterized by comprising the following steps:
s1, establishing a discrete double-fingerprint-based storage cuckoo filter main index table and initializing the table;
s2, acquiring the current instruction type, and if the instruction type is data insertion, entering the step S3; if the instruction type is data query, performing data query; if the instruction type is data deletion, data deletion is carried out;
s3, calculating the fingerprint information of the data member to be inserted, selecting the front k position of the fingerprint information as a front fingerprint, and selecting the rear k-1 position of the fingerprint information as a rear fingerprint;
s4, judging whether the fingerprint information of the member to be inserted is in the double-fingerprint storage cuckoo filter, if so, ending the data insertion and entering the step S13; otherwise, go to step S5;
s5, calculating the positions of two candidate buckets of the data members to be inserted, and selecting one of the candidate buckets as a candidate bucket to be inserted;
s6, judging whether the address stored in the main index pointed by the main index pointer of the candidate bucket to be inserted corresponding to the data member to be inserted is empty, if yes, entering the step S7; otherwise, go to step S8;
s7, generating a main index storage unit, setting the relocation identifier value to 0, filling the pre-fingerprints and the post-fingerprints selected in the step S3 into the main index storage unit, pointing the main index pointer of the corresponding candidate bucket to the main index storage unit, ending the data insertion, and entering the step S13;
s8, judging whether the address stored by the main index of the candidate bucket to be inserted corresponding to the data member to be inserted is a main index storage unit, if so, entering the step S9, otherwise, entering the step S10;
s9, reserving the main index storage unit, generating a primary index table, setting the position address of No. 0 as NULL, and pointing the main index pointer of the corresponding candidate bucket to the primary index table; generating two secondary index storage units of a primary index table, respectively pointing address storage positions of the two secondary index storage units of the data to be inserted in the primary index table to the generated two secondary index storage units, simultaneously setting relocation identification values of the two secondary index storage units to be 0, sequentially inserting a front fingerprint of a data member to be inserted and a front fingerprint in a reserved main index storage unit into a front fingerprint position of the primary index table, sequentially inserting a rear fingerprint of the data member to be inserted and a rear fingerprint in the reserved main index storage unit into the two generated secondary index storage units of the primary index table, canceling the reserved main index storage unit after the insertion is finished, and simultaneously counting the number of the remaining addresses of the primary index table to be-2; ending this data insertion and proceeding to step S13;
s10, judging whether the residual address number of the current index table is 0, if yes, entering the step S12; otherwise, go to step S11;
s11, generating a secondary index storage unit of the current index table, setting the relocation identification value to 0, pointing the address storage position of the secondary index storage unit to be inserted with data in the current index table to the generated secondary index storage unit, sequentially inserting the front fingerprints of the data members to be inserted into the front fingerprint positions of the current index table, sequentially inserting the rear fingerprints of the data members to be inserted into the secondary index storage unit generated by the current index table, counting the residual addresses of the index table to 1, ending the data insertion and entering the step S13;
s12, judging whether a next-level index table exists, if so, entering the step S10, otherwise, performing repositioning operation and entering the step S13;
and S13, judging whether to continue to acquire the current instruction type, if so, returning to the step S2, and if not, finishing the fingerprint filtering of the cuckoo filter.
2. The data processing method based on the discrete double-fingerprint storage cuckoo filter as claimed in claim 1, wherein: when the step S1 is initialized, the whole cuckoo filter system only has one index table with empty address bits, i.e., a main index table; the minimum unit of storage pointed by the address of the main index table is a main index storage unit which consists of a front fingerprint position, a rear fingerprint storage position and a repositioning identification value; the occupation of the two storage bits is the same, the relocation identification value occupies 1 bit and has three states of 0, 1 and n; where 0 represents a newly inserted fingerprint, 1 represents that 1 relocation operation has occurred for this memory location, and n represents that n relocation operations have occurred for the memory location.
3. The data processing method based on the discrete double-fingerprint storage cuckoo filter as claimed in claim 1, wherein: the first-level index table consists of a residual address number storage bit, a No. 0 address bit and a secondary index table storage bit; the storage bit of the residual address number stores the residual number of the storage bits of the secondary index table of the index table, the storage bit is initialized to n and does not include the address bit No. 0; the storage positions of the secondary index table comprise a front fingerprint position of the secondary index table and address storage positions of a secondary index storage unit, the address storage positions of the secondary index storage unit point to the secondary index storage unit correspondingly generated by the secondary index table, the secondary index storage unit sequentially stores the secondary index storage unit from n bits according to a reverse principle, the remaining address number of the index table is 0 until 1 bit is used, namely the index table is fully loaded, the next-level index table is inserted when the index table is fully loaded, and the structure of the next-level index table is the same as that of the first-level index table; address bit number 0 is initialized to NULL and when there is a next level index table, this bit stores the address of the next level index table.
4. The data processing method based on the discrete double-fingerprint storage cuckoo filter as claimed in claim 1, wherein: k of step S3 is less than or equal to xi calculated by hash function and used for inserting data memberX1/2 of the total number of digits of the fingerprint information.
5. The data processing method based on the discrete double-fingerprint storage cuckoo filter as claimed in claim 1, wherein: xi member to be inserted in step S4XThe two candidate barrel positions x and gamma are x ═ h respectively1X) And
Figure FDA0003297525270000031
wherein,
Figure FDA0003297525270000032
is an exclusive or operation; f. ofXFingerprint information of a data member to be inserted; h is1(. h) is a hash function; a data member has two candidate buckets, and data information of the two candidate buckets is stored in only one of the candidate buckets.
6. The data processing method based on the discrete type double-fingerprint storage cuckoo filter as claimed in claim 1, wherein the specific method of the relocation operation in step S12 comprises the following sub-steps:
s12-1, generating an intermediate variable temp in the form of a main index storage unit, randomly filling all fingerprint information of an existing data member in the original primary index table into the intermediate variable temp, and after filling is finished, emptying the storage position information of the secondary index table of the data member and canceling the original secondary index storage unit;
s12-2, generating a new secondary index storage unit in the primary index table, setting the relocation identification value to 0, and pointing the address storage position of the empty secondary index storage unit in the primary index table to the new secondary index storage unit in the primary index table; sequentially inserting data members xiXThe pre-fingerprints are inserted into empty pre-fingerprint positions in a primary index table, and data members xi to be inserted are sequentially insertedXThe post fingerprint is inserted into a new secondary index storage unit of the primary index table;
s12-3, judging whether the repositioning identification value of the intermediate variable temp is 0, if yes, entering the step S12-4; otherwise, entering step S12-11;
s12-4, calculating the position of another candidate bucket of the intermediate variable temp, judging whether the address stored by the main index pointed by the main index pointer of the candidate bucket is empty, if yes, entering the step S12-5; otherwise, entering step S12-6;
s12-5, setting the relocation identification value of the intermediate variable temp to 1, simultaneously pointing the main index pointer of the candidate bucket to the intermediate variable temp, ending the relocation operation and entering the step S14;
s12-6, judging whether the main index pointer of another candidate bucket of the intermediate variable temp points to a main index storage unit, if so, entering the step S12-7; otherwise, entering step S12-9;
s12-7, reserving the main index storage unit, generating a primary index table, setting the position address of 0 as NULL, and pointing the main index pointer of the corresponding candidate bucket to the primary index table;
s12-8, generating two secondary index storage units of the primary index table, one of which sets the relocation identification value to 0 and the other of which sets the relocation identification value to 1; inserting the front fingerprints of the intermediate variable temp and the front fingerprints of the reserved main index storage unit into front fingerprint positions in the secondary index table storage positions of the primary index table to-be-inserted data in sequence, pointing the address storage positions of the secondary index storage units in the secondary index table storage positions of the front fingerprints of the intermediate variable temp to the secondary index storage units with the relocation identifier value of 1, and pointing the address storage positions of the secondary index storage units in the secondary index table storage positions of the front fingerprints of the reserved main index storage units to the secondary index storage units with the relocation identifier value of 0; filling the post-fingerprint of the intermediate variable temp into a secondary index storage unit of a primary index table with a relocation identification value of 1, and inserting the post-fingerprint of a reserved main index storage unit into a secondary index storage unit of the primary index table with a relocation identification value of 0; after the insertion is finished, the intermediate variable temp and the reserved main index storage unit are cancelled, the residual address number-2 of the primary index table is counted, the relocation operation is finished, and the step S14 is entered;
s12-9, judging whether the residual address number of the index table currently pointed by the main index pointer of another candidate bucket of the intermediate variable temp is 0, if yes, entering step S12-1, otherwise, entering step S12-10;
s12-10, generating a secondary index storage unit of the current index table and setting the relocation identification value to 1; inserting the leading fingerprint of the intermediate variable temp into the leading fingerprint position of the secondary index table storage position of the current index table, simultaneously pointing the secondary index storage unit address storage position in the secondary index table storage position of the current index table to the generated secondary index storage unit, inserting the trailing fingerprint of the intermediate variable temp into the secondary index storage unit generated by the current index table, logging out the intermediate variable temp from the residual address number-1 of the current index table, ending the repositioning operation and entering the step S14;
s12-11, searching the current last-level index table of the current candidate bucket of the intermediate variable temp, judging whether the residual address number of the index table is 0, if so, entering the step S12-12, otherwise, entering the step S12-14;
s12-12, generating a next-level index table, setting the address position 0 as NULL, and pointing the address position 0 of the current last-level index table to the next-level index table;
s12-13, generating a secondary index storage unit of the next-level index table, filling the post-fingerprint information of the intermediate variable temp and the value after the repositioning identification value +1 into the secondary index storage unit, sequentially inserting the pre-fingerprint information of the intermediate variable temp into the pre-fingerprint position in the storage position of the secondary index table, and simultaneously pointing the address storage position of the secondary index storage unit in the storage position of the secondary index table to the generated secondary index storage unit; after the insertion is finished, the intermediate variable temp is cancelled, the residual address number-1 of the next-level index table is counted, the relocation operation is finished, and the step S14 is entered;
s12-14, generating a secondary index storage unit of the current last-stage index table, filling the post-fingerprint information of the intermediate variable temp and the value after the repositioning identification value +1 into the secondary index storage unit, sequentially inserting the pre-fingerprint information of the intermediate variable temp into the storage positions of the secondary index table, and simultaneously pointing the address storage positions of the secondary index storage unit of the storage positions of the secondary index table to the generated secondary index storage unit; after the insertion is finished, the intermediate variable temp is cancelled, the remaining address number-1 of the next-level index table is counted, the relocation operation is finished, and the process proceeds to step S14.
7. The data processing method based on the discrete double-fingerprint storage cuckoo filter as claimed in claim 1, wherein the specific method for querying data in step S2 includes the following steps:
s2-1-1, calculating to obtain member xi of data to be inquiredYFingerprint information fYAnd obtaining the pre-fingerprints of the members
Figure FDA0003297525270000061
And post fingerprint
Figure FDA0003297525270000062
S2-1-2, calculating member xi of data to be inquiredYTwo candidate bucket positions χ 'and γ';
s2-1-3, judging the pre-fingerprint
Figure FDA0003297525270000069
If yes, the step S2-1-4 is executed; otherwise, ending the data query and proceeding to step S14;
s2-1-4, judging the pre-fingerprint
Figure FDA0003297525270000063
Whether the post-fingerprint of the candidate bucket is the same as the post-fingerprint
Figure FDA0003297525270000064
If so, judging that the query is successful, ending the data query and entering the step S14; otherwise, the query is determined to be failed, the data query is ended, and the process proceeds to step S14.
8. The data processing method based on the discrete double-fingerprint storage cuckoo filter as claimed in claim 7, wherein: data member xi to be inquired in step S2-1-1YThe two candidate bucket positions χ ' and γ ' are χ ' h1Y) And
Figure FDA0003297525270000065
wherein,
Figure FDA0003297525270000066
is an exclusive or operation; h is1(. cndot.) is a hash function.
9. The data processing method based on the discrete double-fingerprint storage cuckoo filter as claimed in claim 1, wherein the specific method for deleting data in step S2 comprises the following steps:
s2-2-1, calculating xi of the member to be deleteddelAnd obtaining a pre-fingerprint of the member
Figure FDA0003297525270000067
And post fingerprint
Figure FDA0003297525270000068
S2-2-2, calculating the xi of the member to be deleteddelTwo candidate bucket positions χdelAnd gammadel
S2-2-3, judging the xi of the member to be deleteddelWhether or not there is a candidate bucket χdelOr gammadelIf yes, go to step S2-2-4; otherwise, ending the data deletion and proceeding to step S14;
s2-2-4, judging the xi of the member to be deleteddelIf the storage unit is the main index storage unit, the step S2-2-5 is executed if the storage unit is the main index storage unit; otherwise, the step S2-2-6 is entered;
s2-2-5, logging out the main index storage unit, and enabling the address position of the candidate bucket of the main index table to be NULL; ending the data deletion and proceeding to step S14;
s2-2-6, logging out the secondary index storage unit, and setting the values of the storage positions of the secondary index table in the current index table to be NULL;
s2-2-7, judging whether the number of the remaining addresses of the current index table is n-1, namely no load, if yes, entering the step S2-2-8; otherwise, the step S2-2-9 is entered;
s2-2-8, carrying out NULL on the address position 0 of the upper-level index table of the table, canceling the current index table, finishing the data deletion and entering the step S14;
s2-2-9, searching a last-stage index table, migrating the fingerprint data in the last-stage index table to the current index table, and counting the residual addresses of the last-stage index table by + 1;
s2-2-10, taking the last-level index table as the current index table, judging whether the residual address number of the last-level index table is n, namely no load, if yes, entering the step S2-2-8; otherwise, this data deletion is ended and the process proceeds to step S14.
10. The data processing method based on the discrete double-fingerprint storage cuckoo filter as claimed in claim 9, wherein: data member xi to be inquired in step S2-2-1delTwo candidate bucket positions χdelAnd gammadelAre respectively xdel=h1del) And
Figure FDA0003297525270000071
wherein,
Figure FDA0003297525270000072
is an exclusive or operation; h is1(. cndot.) is a hash function.
CN202111181649.4A 2021-10-11 2021-10-11 Data processing method of double-fingerprint storage cuckoo filter based on discrete type Active CN113886391B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111181649.4A CN113886391B (en) 2021-10-11 2021-10-11 Data processing method of double-fingerprint storage cuckoo filter based on discrete type

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111181649.4A CN113886391B (en) 2021-10-11 2021-10-11 Data processing method of double-fingerprint storage cuckoo filter based on discrete type

Publications (2)

Publication Number Publication Date
CN113886391A true CN113886391A (en) 2022-01-04
CN113886391B CN113886391B (en) 2023-03-28

Family

ID=79006040

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111181649.4A Active CN113886391B (en) 2021-10-11 2021-10-11 Data processing method of double-fingerprint storage cuckoo filter based on discrete type

Country Status (1)

Country Link
CN (1) CN113886391B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105630955A (en) * 2015-12-24 2016-06-01 华中科技大学 Method for efficiently managing members of dynamic data set
US20180011852A1 (en) * 2016-07-11 2018-01-11 Microsoft Technology Licensing, Llc Key-Value Storage System including a Resource-Efficient Index
CN113360516A (en) * 2021-08-11 2021-09-07 成都信息工程大学 Set member management method based on first-in first-out and minimum active number strategy

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105630955A (en) * 2015-12-24 2016-06-01 华中科技大学 Method for efficiently managing members of dynamic data set
US20180011852A1 (en) * 2016-07-11 2018-01-11 Microsoft Technology Licensing, Llc Key-Value Storage System including a Resource-Efficient Index
CN113360516A (en) * 2021-08-11 2021-09-07 成都信息工程大学 Set member management method based on first-in first-out and minimum active number strategy

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BRESLOW, A.D., JAYASENA, N.S.: "Morton filters: fast, compressed sparse cuckoo filters" *
K. HUANG AND T. YANG: "Tagged Cuckoo Filters" *

Also Published As

Publication number Publication date
CN113886391B (en) 2023-03-28

Similar Documents

Publication Publication Date Title
US7831626B1 (en) Integrated search engine devices having a plurality of multi-way trees of search keys therein that share a common root node
US7603346B1 (en) Integrated search engine devices having pipelined search and b-tree maintenance sub-engines therein
US8086641B1 (en) Integrated search engine devices that utilize SPM-linked bit maps to reduce handle memory duplication and methods of operating same
US8190591B2 (en) Bit string searching apparatus, searching method, and program
CN110147204B (en) Metadata disk-dropping method, device and system and computer-readable storage medium
US6415375B2 (en) Information storage and retrieval system
US7653619B1 (en) Integrated search engine devices having pipelined search and tree maintenance sub-engines therein that support variable tree height
CN116450656B (en) Data processing method, device, equipment and storage medium
US20040019737A1 (en) Multiple-RAM CAM device and method therefor
CN111552692A (en) Plus-minus cuckoo filter
US7987205B1 (en) Integrated search engine devices having pipelined node maintenance sub-engines therein that support database flush operations
US7725450B1 (en) Integrated search engine devices having pipelined search and tree maintenance sub-engines therein that maintain search coherence during multi-cycle update operations
US7953721B1 (en) Integrated search engine devices that support database key dumping and methods of operating same
US20120239664A1 (en) Bit string search apparatus, search method, and program
CN112817530B (en) Method for reading and writing ordered data in full high efficiency through multiple lines Cheng An
CN103714121A (en) Index record management method and device
CN112434085B (en) Roaring Bitmap-based user data statistical method
CN113886391B (en) Data processing method of double-fingerprint storage cuckoo filter based on discrete type
CN111541617B (en) Data flow table processing method and device for high-speed large-scale concurrent data flow
CN116701440B (en) Cuckoo filter and data insertion, query and deletion method
CN112527196B (en) Cache read-write method and device, computer readable storage medium and electronic equipment
CN111581440B (en) Hardware acceleration B + tree operation device and method thereof
CN113495901A (en) Variable-length data block oriented quick retrieval method
CN110825747B (en) Information access method, device and medium
CN114238226A (en) NVM (non volatile memory) local file management system and method based on SIMD (single instruction multiple data) instruction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant