CN112199333A - Storage method and device supporting multi-value index file - Google Patents

Storage method and device supporting multi-value index file Download PDF

Info

Publication number
CN112199333A
CN112199333A CN202011014922.XA CN202011014922A CN112199333A CN 112199333 A CN112199333 A CN 112199333A CN 202011014922 A CN202011014922 A CN 202011014922A CN 112199333 A CN112199333 A CN 112199333A
Authority
CN
China
Prior art keywords
data
hash bucket
data block
index
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011014922.XA
Other languages
Chinese (zh)
Other versions
CN112199333B (en
Inventor
牛晨光
王梦来
李竞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Greenet Information Service Co Ltd
Original Assignee
Wuhan Greenet Information Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Greenet Information Service Co Ltd filed Critical Wuhan Greenet Information Service Co Ltd
Priority to CN202011014922.XA priority Critical patent/CN112199333B/en
Publication of CN112199333A publication Critical patent/CN112199333A/en
Application granted granted Critical
Publication of CN112199333B publication Critical patent/CN112199333B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/137Hash-based
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0625Power saving in storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of data storage and search, and provides a storage method and a storage device for supporting a multi-value index file. The method comprises the steps of calculating the serial number value of a hash bucket to be j through a preset hash algorithm; when the hash bucket j conflicts, matching is started from the first data block in the hash bucket j, and matching of ArrayCount data blocks recorded by a data management structure in the hash bucket j is completed; if a consistent data block is found in the matching process, directly using the record item of the data block, and ending the matching process; and if the data blocks are not matched, applying for a new data block, and entering a data block distribution process. The invention provides a borrowing method applied to adjacent hash buckets of hash buckets, and solves the problem that in the prior art, data blocks applied in batches form a hashed discontinuous memory space, so that computing resources are wasted during access.

Description

Storage method and device supporting multi-value index file
[ technical field ] A method for producing a semiconductor device
The invention relates to the technical field of data storage and search, in particular to a storage method and a storage device supporting a multi-value index file.
[ background of the invention ]
In order to export and view raw data of control plane and service plane of a single user in a network through some expert subsystem in the relevant OSS system of a telecom operator, a system such as DPI is required to be constructed to support storing and inquiring user signaling raw data according to a user number.
At present, the number of users borne by a DPI system constructed by taking provinces as a unit is over 1000 thousands, and the speed data of an original signaling packet generated in real time is up to 6000000 pps. There is therefore a need for a more efficient, storage-efficient, hardware-efficient indexing scheme than the use of common distributed storage solutions such as Hadoop.
Currently, the most common index algorithm is hash tree, and the hash belongs to the algorithm with the most stable performance. A good hash algorithm can provide a good hash effect, but the hash collision can never be completely avoided, so any system using hash as a fast indexing algorithm needs to resolve the hash collision.
Considering the memory overhead and the index performance, the conflict in the online system is tolerant in 7x24 hours, that is, all conflicts cannot be solved without limit. The number of collisions under a hash is usually limited to only N, and collisions exceeding this value are discarded directly without storage, as shown in fig. 1.
Another conflict resolution method is also commonly used in memory indexing implementations: applying M conflict-solving data pools in advance, solving when the total number of conflicts in the whole index system does not exceed M, and discarding when the total number exceeds the value. This scheme is optimized to solve only N collisions under a hash, and can increase the utilization of the data blocks reserved for collision resolution, as shown in fig. 2.
However, the scheme is only suitable for memory indexing, the data blocks applied in batches can hash discontinuous memory spaces, and for index files, the data blocks can hash discontinuous file spaces, so that the files can be randomly read and written, and the performance is greatly consumed.
In view of the above, overcoming the drawbacks of the prior art is an urgent problem in the art.
[ summary of the invention ]
The technical problem to be solved by the invention is that the current scheme is only suitable for memory indexing, the data blocks applied in batches can hash discontinuous memory spaces, and for index files, the data blocks can hash discontinuous file spaces, so that the files can be randomly read and written, and the performance is greatly consumed.
The invention adopts the following technical scheme:
in a first aspect, the invention provides a storage method supporting a multi-value index file, which applies for a collision array with continuous global memory and allocates N data blocks for storing data collision for each hash bucket; setting the maximum number of data blocks allowed to be accessed as X for each hash bucket; wherein, the value of X satisfies: x > -N and the statistics of X all start with the initial data block of the corresponding hash bucket; the data block identification Index in the corresponding use range corresponding to the ith hash bucket satisfies the following conditions: (N × i) < ═ Index < (N × i + 1)); setting a field ArrayCount in the data management structure corresponding to each hash bucket, wherein the field ArrayCount is used for recording the number of data blocks actually used in the hash bucket; when a record of the keyword K is to be stored, the method comprises the following steps:
calculating the serial number value of the hash bucket to be j through a preset hash algorithm;
when the hash bucket j conflicts, matching is started from the first data block in the hash bucket j, and matching of ArrayCount data blocks recorded by a data management structure in the hash bucket j is completed;
if a consistent data block is found in the matching process, directly using the record item of the data block, and ending the matching process; if not, a new data block is required to be applied, and then a data block allocation flow is entered:
in the data block distribution process, searching for an idle data block from ArrayCount to X in the data block distribution process; and if N + j + ArrayCount > N, borrowing and storing the next adjacent hash bucket j +1 of the hash bucket j, and synchronously updating the ArrayCount value in the hash bucket j +1 data management structure.
Preferably, the data stored in the data blocks establishes a data index chain in a reverse index manner, wherein each data block comprises an address pointer pPrev pointing to the previously stored conflicting data and a Value storing the content of its own conflicting data.
Preferably, in a hash bucket, an address pointer of a data block at the end of a data index chain is stored in a data management structure of the hash bucket, and, each time collision data is newly added to a corresponding hash bucket, pPrev in the data block for carrying the newly added collision data is assigned as the address pointer stored in the data management structure, and Value in the data block for carrying the newly added collision data is assigned as the content of the collision data; and updating the pointer stored in the data management structure to be the address pointer of the data block for bearing the newly increased conflict data.
Preferably, the collision rate and the memory occupation can be guaranteed to be optimal when the total number H of the hash buckets is 2-5 times of the total number of the stored keywords.
Preferably, the value of N is generally associated with a maximum number of collisions C, N satisfying the following condition: n MAX (C/5,2)
Preferably, the value of X is set to N × 2 to N × 3.
Preferably, when a record of the keyword K is to be stored, if it is determined that the size of the ArrayCount value in the jth hash bucket is equal to X, after the content of each piece of collision data recorded in the jth hash bucket is matched, if a consistent result is not matched, the corresponding keyword K is directly discarded.
Preferably, the conflict array is further recorded in an assigned index document, where the assigned index document performs file handle division according to a time interval, specifically:
index documents containing different conflict arrays are loaded into the memory when the current time is matched with the time interval associated in the file handle;
and if the current time reaches the other end of the association time of the index document loaded in the memory at present, searching the file handle matched with the current time, and then loading the index document corresponding to the corresponding file handle.
In a second aspect, the present invention further provides a storage apparatus supporting a multi-valued index file, for implementing the storage method supporting the multi-valued index file in the first aspect, where the apparatus includes:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the processor for performing the storing method supporting a multi-valued index file of the first aspect.
In a third aspect, the present invention further provides a non-volatile computer storage medium, where the computer storage medium stores computer-executable instructions, which are executed by one or more processors, for implementing the storage method for supporting a multi-valued index file according to the first aspect.
The invention provides a borrowing method applied to adjacent hash buckets of hash buckets, and solves the problems that in the prior art, data blocks applied in batches form discontinuous hash memory spaces to cause waste of computing resources during access, and the application of the memory spaces of a plurality of hash buckets is carried out at one time, and the applied space can not meet the actual requirement to cause waste of storage resources.
[ description of the drawings ]
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below. It is obvious that the drawings described below are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
Fig. 1 is a schematic diagram illustrating a memory overhead and index performance presentation architecture according to an embodiment of the present invention;
fig. 2 is a schematic diagram illustrating an effect of resolving a usage rate of a data block reserved by a conflict according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a storage method supporting a multi-valued index file according to an embodiment of the present invention;
FIG. 4 is a block diagram of an inverted index scheme according to an embodiment of the present invention;
fig. 5 is a schematic diagram illustrating an effect of a hash bucket data structure according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a storage device supporting a multi-valued index file according to an embodiment of the present invention.
[ detailed description ] embodiments
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In the description of the present invention, the terms "inner", "outer", "longitudinal", "lateral", "upper", "lower", "top", "bottom", and the like indicate orientations or positional relationships based on those shown in the drawings, and are for convenience only to describe the present invention without requiring the present invention to be necessarily constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.
In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Example 1:
the embodiment 1 of the invention provides a storage method supporting a multi-value index file, which applies for a collision array with continuous global memory and allocates N data blocks for storing data collision for each hash bucket; setting the maximum number of data blocks allowed to be accessed as X for each hash bucket; wherein, the value of X satisfies: x > -N and the statistics of X all start with the initial data block of the corresponding hash bucket; the data block identification Index in the corresponding use range corresponding to the ith hash bucket satisfies the following conditions: (N × i) < ═ Index < (N × i + 1)); setting a field ArrayCount in a data management structure corresponding to each hash bucket, wherein the field ArrayCount is used for recording the number of data blocks actually used in the hash bucket, and the number of the data blocks actually used recorded by the ArrayCount comprises the data blocks used by the ArrayCount and the data blocks occupied by borrowing; to store a record of the key K, as shown in fig. 3, the method includes:
in step 201, a number value j of the hash bucket is calculated by a preset hash algorithm.
In step 202, when a hash bucket j conflicts, matching is started from the first data block in the hash bucket j, and matching of ArrayCount data blocks recorded by the data management structure in the hash bucket j is completed.
In step 203, if a consistent data block is found in the matching process, the record item of the data block is directly used, and the matching process is ended; if not, a new data block is required to be applied, and then a data block allocation flow is entered:
in step 204, in the data block allocation procedure, the data block allocation procedure searches for a free data block from ArrayCount to X; and if N + j + ArrayCount > N, borrowing and storing the next adjacent hash bucket j +1 of the hash bucket j, and synchronously updating the ArrayCount value in the hash bucket j +1 data management structure.
The invention provides a borrowing method applied to adjacent hash buckets of hash buckets, and solves the problems that in the prior art, data blocks applied in batches form discontinuous hash memory spaces to cause waste of computing resources during access, and the application of the memory spaces of a plurality of hash buckets is carried out at one time, and the applied space can not meet the actual requirement to cause waste of storage resources.
In combination with the embodiment of the present invention, a storage manner of conflicting data in the data block in the hash bucket is preferably performed in a reverse index manner, as shown in fig. 4, the data stored in the data block establishes a data index chain in the reverse index manner, where each data block includes an address pointer pPrev pointing to previously stored conflicting data and a Value storing content of conflicting data of its own.
In order to avoid the random reading and writing problem caused by the forward index, the embodiment of the invention provides the reverse file index. The core idea of the reverse index is as follows: it is not necessary that the last node points to the newly inserted node, but that the new insertion points to the last node. As shown in fig. 4, where pTail stores the file offset of the last node, the following operations are performed when the pNode4 node needs to be inserted:
1) the pPrev field of pNode4 needs to be set to the value of pTail.
2) The pNode4 node is written into the tail of the index file. (can complete batch operation with other nodes to be written through a buffer mechanism to achieve the optimal I/O performance.)
3) The file offset at which pNode4 is located is assigned to the pTail field.
According to the above operation method, the historical data in the current index file does not need to be modified, and only the correct setting of pPrev of pNode4 is completed before the pNode4 is written, and the root system pTail value is obtained.
All linked lists in the index file have the pTail values of the linked lists, and the pTail is the access entry of the linked lists, so that the linked lists are stored. The periodic line synchronous storage can be performed through an additional Entry area or file, and the optimal I/O performance can also be achieved because a plurality of linked lists pTail are written into the Entry file in a batch mode in a sequential overlapping manner, and a hash bucket data structure formed by combining the method of the embodiment of the present invention and the reverse file index is shown in fig. 5. In a hash bucket, storing an address pointer of a data block at the tail of a data index chain in a data management structure of the hash bucket, assigning pPrev in the data block for bearing newly-increased conflict data as the address pointer stored in the data management structure and assigning Value in the data block for bearing newly-increased conflict data as the conflict data content each time conflict data is newly added in the corresponding hash bucket; and updating the pointer stored in the data management structure to be the address pointer of the data block for bearing the newly increased conflict data.
The total number H of the hash buckets is 2-5 times of the total number of the stored keywords, and the collision rate and the memory occupation can be guaranteed to be optimal. It should be noted that fig. 5 shows that at least 4 linked list spaces (i.e. total number N of data blocks) are involved in one hash bucket shown in the figure only for convenience of presenting the linked list relationship, but in a practical application scenario of the present invention, the value of N is usually related to the maximum number of collisions C, and N satisfies the following condition: n ═ MAX (C/5,2), i.e., the maximum of C/5 and 2. In the embodiment of the present invention, the value of X is usually set to N × 2 to N × 3.
When the record of the key word K is to be stored, if the ArrayCount value in the jth Hash bucket is determined to be equal to X, after the contents of each conflict data recorded in the jth Hash bucket are matched, if a consistent result is not matched, directly discarding the corresponding key word K.
With reference to the embodiments of the present invention, there is also a preferred implementation scheme, which can further improve data indexing efficiency, and this implementation scheme is generally applicable to a clear time limit existing in conflict data analysis, and the conflict array is also recorded in an assigned index document, where the assigned index document performs file handle division according to a time interval, specifically:
index documents containing different conflict arrays are loaded into the memory when the current time is matched with the time interval associated in the file handle;
and if the current time reaches the other end of the association time of the index document loaded in the memory at present, searching the file handle matched with the current time, and then loading the index document corresponding to the corresponding file handle.
The index file is stored in the physical partition according to time, so that the system needs to manage the index file handles of different partitions.
The system of the invention has certain requirements on the time sequence of the original data to be indexed: out-of-order packets with times greater than "time zone size/2" are not allowed to occur, and such data is discarded if it occurs.
Based on the premise, the system only needs to keep 2 index file handles. The specific execution logic:
1) assume that the time zone in which the system is implemented spans 1 hour.
2) When the current time is 1:00:00, the system maintains the file handle of point 0 and the file handle of point 1.
3) This time allows the original data with time stamps in the range of 0:00:00,2:00:00) to be indexed and stored.
4) When the current time is 1:30:00, the file handle of 0 point is closed, and the file handle of 2 points is created. At this point the system maintains file handles for point 1 and point 2.
5) This time allows the original data with time stamps in the range of [1:00:00,3:00:00) to be indexed and stored.
6) And so on. The file handle validity sample is as follows:
Figure BDA0002698725870000071
Figure BDA0002698725870000081
under the logic, the system can be ensured to only keep and open the handles of 2 time zones, the occupation of system resources is reduced, and higher time disorder fault tolerance rate can be ensured. And respectively managing 2 handles by a double-thread/process mode can ensure that the initialization of the index file can be simultaneously carried out when the handle is created for the next time zone (generally, the initialization operation of the file is time-consuming, and if the initialization of the index file of the current time zone is triggered after the data of the current time zone arrives, a large amount of instantaneous index file backlog and loss can be caused.
Example 2:
fig. 6 is a schematic diagram of an architecture of a storage device supporting a multi-value index file according to an embodiment of the present invention. The storage device supporting the multi-value index file of the present embodiment includes one or more processors 21 and a memory 22. In fig. 6, one processor 21 is taken as an example.
The processor 21 and the memory 22 may be connected by a bus or other means, such as the bus connection in fig. 6.
The memory 22, which is a nonvolatile computer-readable storage medium, may be used to store a nonvolatile software program and a nonvolatile computer-executable program, such as the storage method supporting the multi-value index file in embodiment 1. The processor 21 executes the storage method supporting the multi-value index file by executing the nonvolatile software program and instructions stored in the memory 22.
The memory 22 may include high speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory 22 may optionally include memory located remotely from the processor 21, and these remote memories may be connected to the processor 21 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The program instructions/modules are stored in the memory 22, and when executed by the one or more processors 21, perform the storage method supporting the multi-value index file in the above-described embodiment 1, for example, perform the respective steps shown in fig. 3 described above.
It should be noted that, for the information interaction, execution process and other contents between the modules and units in the apparatus and system, the specific contents may refer to the description in the embodiment of the method of the present invention because the same concept is used as the embodiment of the processing method of the present invention, and are not described herein again.
Those of ordinary skill in the art will appreciate that all or part of the steps of the various methods of the embodiments may be implemented by associated hardware as instructed by a program, which may be stored on a computer-readable storage medium, which may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (9)

1. A storage method supporting multi-valued index files is characterized in that a conflict array with continuous global memory is applied, and N data blocks for storing data conflicts are distributed to each hash bucket; setting the maximum number of data blocks allowed to be accessed as X for each hash bucket; wherein, the value of X satisfies: x > -N and the statistics of X all start with the initial data block of the corresponding hash bucket; the data block identification Index in the corresponding use range corresponding to the ith hash bucket satisfies the following conditions: (N × i) < ═ Index < (N × i + 1)); setting a field ArrayCount in the data management structure corresponding to each hash bucket, wherein the field ArrayCount is used for recording the number of data blocks actually used in the hash bucket; when a record of the keyword K is to be stored, the method comprises the following steps:
calculating the serial number value of the hash bucket to be j through a preset hash algorithm;
when the hash bucket j conflicts, matching is started from the first data block in the hash bucket j, and matching of ArrayCount data blocks recorded by a data management structure in the hash bucket j is completed;
if a consistent data block is found in the matching process, directly using the record item of the data block, and ending the matching process; if not, a new data block is required to be applied, and then a data block allocation flow is entered:
in the data block distribution process, searching for an idle data block from ArrayCount to X in the data block distribution process; and if N + j + ArrayCount > N, borrowing and storing the next adjacent hash bucket j +1 of the hash bucket j, and synchronously updating the ArrayCount value in the hash bucket j +1 data management structure.
2. The storage method supporting the multi-Value index file according to claim 1, wherein the data stored in the data blocks establishes a data index chain in a reverse index manner, wherein each data block comprises an address pointer pPrev pointing to the previously stored conflict data and a Value storing the content of the conflict data of the data block.
3. The storage method supporting the multi-Value index file according to claim 2, wherein an address pointer of a data block at the end of a data index chain is stored in a hash bucket, and, each time collision data is newly added to the corresponding hash bucket, pPrev in the data block for carrying newly added collision data is assigned as the address pointer stored in the data management structure, and Value in the data block for carrying newly added collision data is assigned as the collision data content; and updating the pointer stored in the data management structure to be the address pointer of the data block for bearing the newly increased conflict data.
4. The storage method supporting the multi-valued index file according to any of claims 1 to 3, wherein the collision rate and the memory occupation can be guaranteed to be optimal when the total number H of the hash buckets is 2 to 5 times of the total number of the storage keywords.
5. The storage method supporting the multi-value index file according to any one of claims 1 to 3, wherein the value of N is generally related to the maximum conflict number C, and N satisfies the following condition: n ═ MAX (C/5, 2).
6. The storage method supporting the multi-value index file according to any one of claims 1 to 3, wherein the value of X is set to N X2 to N X3.
7. The method according to claim 1, wherein when a record of a keyword K is to be stored, if it is determined that the size of the ArrayCount value in the jth hash bucket is equal to X, after the matching of each content of conflicting data recorded in the jth hash bucket is completed, if no matching result is obtained, the corresponding keyword K is directly discarded.
8. The storage method supporting the multi-value index file according to claim 1, wherein the conflict array is further recorded in a designated index document, wherein the designated index document performs file handle division according to time intervals, specifically:
index documents containing different conflict arrays are loaded into the memory when the current time is matched with the time interval associated in the file handle;
and if the current time reaches the other end of the association time of the index document loaded in the memory at present, searching the file handle matched with the current time, and then loading the index document corresponding to the corresponding file handle.
9. A storage apparatus supporting a multi-valued index file, the apparatus comprising:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor for performing the method of supporting multi-valued index file storage of any of claims 1-8.
CN202011014922.XA 2020-09-24 2020-09-24 Storage method and device supporting multi-valued index file Active CN112199333B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011014922.XA CN112199333B (en) 2020-09-24 2020-09-24 Storage method and device supporting multi-valued index file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011014922.XA CN112199333B (en) 2020-09-24 2020-09-24 Storage method and device supporting multi-valued index file

Publications (2)

Publication Number Publication Date
CN112199333A true CN112199333A (en) 2021-01-08
CN112199333B CN112199333B (en) 2022-11-22

Family

ID=74016133

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011014922.XA Active CN112199333B (en) 2020-09-24 2020-09-24 Storage method and device supporting multi-valued index file

Country Status (1)

Country Link
CN (1) CN112199333B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112800057A (en) * 2021-01-22 2021-05-14 新华三大数据技术有限公司 Fingerprint table management method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003030040A (en) * 2001-07-12 2003-01-31 Nec Commun Syst Ltd Hush indexes of object database system and non-unique index management system
CN101753445A (en) * 2009-12-23 2010-06-23 重庆邮电大学 Fast flow classification method based on keyword decomposition hash algorithm
CN102541968A (en) * 2010-12-31 2012-07-04 百度在线网络技术(北京)有限公司 Indexing method
CN102609487A (en) * 2012-01-20 2012-07-25 东华大学 Column-storage-oriented Hash joint method for indexes in barrels
CN103064906A (en) * 2012-12-18 2013-04-24 华为技术有限公司 File management method and device
CN107515901A (en) * 2017-07-24 2017-12-26 中国科学院信息工程研究所 A kind of chain type daily record storage organization and its Hash Index Structure, data manipulation method and server, medium
US20200226099A1 (en) * 2019-01-11 2020-07-16 Jyothi Vemulapalli Method and apparatus for improving hash searching throughput in the event of hash collisions

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003030040A (en) * 2001-07-12 2003-01-31 Nec Commun Syst Ltd Hush indexes of object database system and non-unique index management system
CN101753445A (en) * 2009-12-23 2010-06-23 重庆邮电大学 Fast flow classification method based on keyword decomposition hash algorithm
CN102541968A (en) * 2010-12-31 2012-07-04 百度在线网络技术(北京)有限公司 Indexing method
CN102609487A (en) * 2012-01-20 2012-07-25 东华大学 Column-storage-oriented Hash joint method for indexes in barrels
CN103064906A (en) * 2012-12-18 2013-04-24 华为技术有限公司 File management method and device
CN107515901A (en) * 2017-07-24 2017-12-26 中国科学院信息工程研究所 A kind of chain type daily record storage organization and its Hash Index Structure, data manipulation method and server, medium
US20200226099A1 (en) * 2019-01-11 2020-07-16 Jyothi Vemulapalli Method and apparatus for improving hash searching throughput in the event of hash collisions

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
林朝晖 等: "高维分布式局部敏感哈希索引方法", 《计算机科学与探索》 *
英昌甜 等: "内存计算环境下基于索引结构的内存优化策略", 《新疆大学学报(自然科学版)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112800057A (en) * 2021-01-22 2021-05-14 新华三大数据技术有限公司 Fingerprint table management method and device
CN112800057B (en) * 2021-01-22 2023-06-09 新华三大数据技术有限公司 Fingerprint table management method and device

Also Published As

Publication number Publication date
CN112199333B (en) 2022-11-22

Similar Documents

Publication Publication Date Title
CN103425756B (en) The replication strategy of data block in a kind of HDFS
CN109213772A (en) Date storage method and NVMe storage system
US10740006B2 (en) System and method for enabling high read rates to data element lists
CN102446139B (en) Method and device for data storage
US11210228B2 (en) Method, device and computer program product for cache management
US7111289B2 (en) Method for implementing dual link list structure to enable fast link-list pointer updates
US8190857B2 (en) Deleting a shared resource node after reserving its identifier in delete pending queue until deletion condition is met to allow continued access for currently accessing processor
US20210019257A1 (en) Persistent memory storage engine device based on log structure and control method thereof
CN109766318B (en) File reading method and device
CN109240607B (en) File reading method and device
CN106599091B (en) RDF graph structure storage and index method based on key value storage
CN103425435A (en) Disk storage method and disk storage system
CN112199333B (en) Storage method and device supporting multi-valued index file
CN115129621A (en) Memory management method, device, medium and memory management module
CN114327642A (en) Data read-write control method and electronic equipment
WO2019174206A1 (en) Data reading method and apparatus of storage device, terminal device, and storage medium
CN110008030A (en) A kind of method of metadata access, system and equipment
WO2016187975A1 (en) Internal memory defragmentation method and apparatus
US7953721B1 (en) Integrated search engine devices that support database key dumping and methods of operating same
CN111124313A (en) Data reading and writing method and device for power acquisition terminal and electronic equipment
CN110334251B (en) Element sequence generation method for effectively solving rehash conflict
US10067690B1 (en) System and methods for flexible data access containers
CN112433672A (en) Solid state disk reading method and device
CN117632953B (en) Data cycle storage method, device, server and storage medium
CN102439947B (en) Method and device for processing address request

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant