CN112199333B - Storage method and device supporting multi-valued index file - Google Patents

Storage method and device supporting multi-valued index file Download PDF

Info

Publication number
CN112199333B
CN112199333B CN202011014922.XA CN202011014922A CN112199333B CN 112199333 B CN112199333 B CN 112199333B CN 202011014922 A CN202011014922 A CN 202011014922A CN 112199333 B CN112199333 B CN 112199333B
Authority
CN
China
Prior art keywords
data
hash bucket
data block
index
hash
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011014922.XA
Other languages
Chinese (zh)
Other versions
CN112199333A (en
Inventor
牛晨光
王梦来
李竞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Greenet Information Service Co Ltd
Original Assignee
Wuhan Greenet Information Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Greenet Information Service Co Ltd filed Critical Wuhan Greenet Information Service Co Ltd
Priority to CN202011014922.XA priority Critical patent/CN112199333B/en
Publication of CN112199333A publication Critical patent/CN112199333A/en
Application granted granted Critical
Publication of CN112199333B publication Critical patent/CN112199333B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/137Hash-based
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0625Power saving in storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of data storage and search, and provides a storage method and a storage device for supporting a multi-valued index file. The method comprises the steps of calculating the serial number value of a hash bucket to be j through a preset hash algorithm; when the hash bucket j conflicts, matching is started from the first data block in the hash bucket j, and matching of ArrayCount data blocks recorded by a data management structure in the hash bucket j is completed; if a consistent data block is found in the matching process, directly using the record item of the data block, and ending the matching process; if not, a new data block is required to be applied, and then a data block allocation flow is entered. The invention provides a borrowing method applied to adjacent hash buckets of hash buckets, and solves the problem that in the prior art, data blocks applied in batches form a hashed discontinuous memory space, so that computing resources are wasted during access.

Description

Storage method and device supporting multi-valued index file
[ technical field ] A
The invention relates to the technical field of data storage and search, in particular to a storage method and a storage device supporting a multi-value index file.
[ background ] A method for producing a semiconductor device
In order to export and view raw data of control plane and service plane of a single user in a network through some expert subsystem in the relevant OSS system of a telecom operator, a system such as DPI is required to be constructed to support storing and inquiring user signaling raw data according to a user number.
At present, the number of users borne by a DPI system constructed by taking provinces as a unit is over 1000 thousands, and the speed data of an original signaling packet generated in real time is up to 6000000pps. There is therefore a need for a more efficient, storage-efficient, hardware-efficient indexing scheme than the use of common distributed storage solutions such as Hadoop.
Currently, the most common index algorithm is hash tree, and the hash belongs to the algorithm with the most stable performance. A good hash algorithm can provide a good hash effect, but the hash collision can never be completely avoided, so any system using hash as a fast indexing algorithm needs to resolve the hash collision.
Considering the memory overhead and the index performance, the conflict in the online system of 7x24 hours is tolerant, that is, all conflicts cannot be solved without limit. The number of collisions under a hash is usually limited to only N, and collisions exceeding this value are discarded directly without storage, as shown in fig. 1.
Another conflict resolution method is also commonly used in memory indexing implementations: applying M conflict-solving data pools in advance, solving when the total number of conflicts in the whole index system does not exceed M, and discarding when the total number exceeds the value. This scheme is optimized to solve only N collisions under a hash, and can increase the utilization of the data blocks reserved for collision resolution, as shown in fig. 2.
However, the scheme is only suitable for memory indexing, the data blocks applied in batches can hash discontinuous memory spaces, and for index files, the data blocks can hash discontinuous file spaces, so that the files can be randomly read and written, and the performance is greatly consumed.
In view of the above, overcoming the drawbacks of the prior art is an urgent problem in the art.
[ summary of the invention ]
The technical problem to be solved by the invention is that the current scheme is only suitable for memory indexing, the data blocks applied in batches can hash discontinuous memory spaces, and for index files, the data blocks can hash discontinuous file spaces, so that the files can be randomly read and written, and the performance is greatly consumed.
The invention adopts the following technical scheme:
in a first aspect, the invention provides a storage method supporting a multi-valued index file, which applies for a collision array with continuous global memory and allocates N data blocks for storing data collision for each hash bucket; setting the maximum number of data blocks allowed to be accessed as X in each hash bucket; wherein, the value of X satisfies: x > = N, and statistics of X all start from the initial data block of the corresponding hash bucket; the data block identifier Index in the corresponding use range corresponding to the ith hash bucket meets the following conditions: (N = i) < (= Index < (N: (i + 1)); setting a field ArrayCount corresponding to the data management structure of each hash bucket, wherein the field ArrayCount is used for recording the number of data blocks actually used in the hash bucket; when a record of the keyword K is to be stored, the method comprises the following steps:
calculating the serial number value of the hash bucket as j through a preset hash algorithm;
when the hash bucket j conflicts, matching is started from the first data block in the hash bucket j, and matching of ArrayCount data blocks recorded by a data management structure in the hash bucket j is completed;
if a consistent data block is found in the matching process, directly using the record item of the data block, and ending the matching process; if not, a new data block is required to be applied, and then a data block allocation flow is entered:
in the data block distribution process, searching for an idle data block from ArrayCount to X in the data block distribution process; and if N + j + ArrayCount > N, borrowing and storing the next adjacent hash bucket j +1 of the hash bucket j, and synchronously updating the ArrayCount value in the hash bucket j +1 data management structure.
Preferably, the data stored in the data blocks establishes a data index chain in a reverse index manner, wherein each data block includes an address pointer pprv pointing to the previously stored conflicting data, and a Value storing the content of its own conflicting data.
Preferably, in a hash bucket, an address pointer of a data block at the tail of the data index chain is stored in a data management structure of the hash bucket, and when collision data is newly added in a corresponding hash bucket each time, the pprv in the data block for bearing the newly added collision data is assigned as the address pointer stored in the data management structure, and the Value in the data block for bearing the newly added collision data is assigned as the content of the collision data; and updating the pointer stored in the data management structure to be the address pointer of the data block for bearing the newly increased conflict data.
Preferably, the collision rate and the memory occupation can be guaranteed to be optimal when the total number H of the hash buckets is 2-5 times of the total number of the stored keywords.
Preferably, the value of N is generally associated with a maximum number of collisions C, N satisfying the following condition: n = MAX (C/5, 2)
Preferably, the value of X is set to N × 2 to N × 3.
Preferably, when a record of the keyword K is to be stored, if it is determined that the size of the ArrayCount value in the jth hash bucket is equal to X, after the content of each piece of collision data recorded in the jth hash bucket is matched, if a consistent result is not matched, the corresponding keyword K is directly discarded.
Preferably, the conflict array is further recorded in an assigned index document, where the assigned index document performs file handle division according to a time interval, specifically:
index documents containing different conflict arrays are loaded into the memory when the current time is matched with the time interval associated in the file handle;
and if the current time reaches the other end of the association time of the index document loaded in the memory at present, searching the file handle matched with the current time, and then loading the index document corresponding to the corresponding file handle.
In a second aspect, the present invention further provides a storage apparatus supporting a multi-valued index file, for implementing the storage method supporting the multi-valued index file in the first aspect, where the apparatus includes:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the processor to perform the method for supporting multi-valued index file storage of the first aspect.
In a third aspect, the present invention further provides a non-volatile computer storage medium, where the computer storage medium stores computer-executable instructions, and the computer-executable instructions are executed by one or more processors, and are used for completing the storage method supporting the multi-value index file according to the first aspect.
The invention provides a borrowing method applied to adjacent hash buckets of hash buckets, and solves the problems that in the prior art, data blocks applied in batches form discontinuous hash memory spaces to cause waste of computing resources during access, and the application of the memory spaces of a plurality of hash buckets is carried out at one time, and the applied space can not meet the actual requirement to cause waste of storage resources.
[ description of the drawings ]
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below. It is obvious that the drawings described below are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
Fig. 1 is a schematic diagram illustrating a memory overhead and index performance presentation architecture according to an embodiment of the present invention;
fig. 2 is a schematic diagram illustrating an effect of resolving a utilization rate of a data block reserved by a conflict according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a storage method supporting a multi-valued index file according to an embodiment of the present invention;
FIG. 4 is a block diagram of an inverted index scheme according to an embodiment of the present invention;
fig. 5 is a schematic diagram illustrating an effect of a hash bucket data structure according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a storage device supporting a multi-valued index file according to an embodiment of the present invention.
[ detailed description ] A
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In the description of the present invention, the terms "inner", "outer", "longitudinal", "lateral", "upper", "lower", "top", "bottom", and the like indicate orientations or positional relationships based on orientations or positional relationships shown in the drawings, and are for convenience in describing the present invention only and do not require that the present invention be constructed and operated in a particular orientation, and therefore should not be construed as limiting the present invention.
In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Example 1:
the embodiment 1 of the invention provides a storage method supporting a multi-valued index file, which applies for a collision array with continuous global memory and allocates N data blocks for storing data collision for each hash bucket; setting the maximum number of data blocks allowed to be accessed as X in each hash bucket; wherein, the value of X satisfies: x > = N, and statistics of X all start from the initial data block of the corresponding hash bucket; the data block identifier Index in the corresponding use range corresponding to the ith hash bucket meets the following conditions: (N × i) < = Index < (N × (i + 1)); setting a field ArrayCount in a data management structure corresponding to each hash bucket, wherein the field ArrayCount is used for recording the number of data blocks actually used in the hash bucket, and the number of the data blocks actually used recorded by the ArrayCount comprises the data blocks used by the ArrayCount and the data blocks occupied by borrowing; when a record of the keyword K is to be stored, as shown in fig. 3, the method includes:
in step 201, a number value j of the hash bucket is calculated by a preset hash algorithm.
In step 202, when a hash bucket j conflicts, matching is started from the first data block in the hash bucket j, and matching of ArrayCount data blocks recorded by the data management structure in the hash bucket j is completed.
In step 203, if a consistent data block is found in the matching process, the record item of the data block is directly used, and the matching process is ended; if not, a new data block needs to be applied, and then a data block allocation flow is entered:
in step 204, in the data block allocation process, the data block allocation process searches for an idle data block from ArrayCount to X; and if N + j + ArrayCount > N, borrowing and storing the next adjacent hash bucket j +1 of the hash bucket j, and synchronously updating the ArrayCount value in the data management structure of the hash bucket j + 1.
The invention provides a borrowing method applied to adjacent hash buckets of hash buckets, and solves the problems that in the prior art, data blocks applied in batches form discontinuous hash memory spaces to cause waste of computing resources during access, and the application of the memory spaces of a plurality of hash buckets is carried out at one time, and the applied space can not meet the actual requirement to cause waste of storage resources.
In combination with the embodiment of the present invention, a storage manner of conflicting data in the data block in the hash bucket is preferably performed in a reverse index manner, as shown in fig. 4, the data stored in the data block establishes a data index chain in the reverse index manner, where each data block includes an address pointer pPrev pointing to previously stored conflicting data and a Value storing content of conflicting data of its own.
In order to avoid the random reading and writing problem caused by the forward index, the embodiment of the invention provides the reverse file index. The core idea of the reverse index is as follows: it is not necessary that the last node points to the newly inserted node, but that the new insertion points to the last node. As shown in fig. 4, where pTail stores the file offset of the last node, the following operations are performed when a pNode4 node needs to be inserted:
1) The pPrev field of pNode4 needs to be set to the value of pTail.
2) And writing the pNode4 node into the tail part of the index file. (can complete batch operation with other nodes to be written through a buffer mechanism to achieve the optimal I/O performance.)
3) The file offset where pNode4 is located is assigned to the pTail field.
According to the operation method, historical data in the current index file does not need to be modified, and only the correct setting of pPrev of the pNode4 is completed before the pNode4 is written, and the root system pTail value is obtained.
All linked lists in the index file have the pTail values of the linked lists, and the pTail is the access entry of the linked lists, so that the linked lists are stored. The periodic line synchronous storage can be performed through an additional Entry area or file, and the optimal I/O performance can also be achieved because a plurality of linked lists pTail are written into the Entry file in a batch mode in a sequential overlay manner, and a hash bucket data structure formed by combining the method according to the embodiment of the present invention and the reverse file index mentioned above is shown in fig. 5. In a hash bucket, storing an address pointer of a data block at the tail of a data index chain in a data management structure of the hash bucket, assigning pPrev in the data block for bearing newly-increased conflict data as the address pointer stored in the data management structure and assigning Value in the data block for bearing newly-increased conflict data as the conflict data content each time conflict data is newly added in the corresponding hash bucket; and updating the pointer stored in the data management structure to be the address pointer of the data block for bearing the newly increased conflict data.
When the total number H of the hash buckets is 2-5 times of the total number of the stored keywords, the collision rate and the memory occupation can be guaranteed to be optimal. It should be noted that fig. 5 shows that at least 4 linked list spaces (i.e. total number N of data blocks) are involved in one hash bucket shown in the figure only for convenience of presenting the linked list relationship, but in a practical application scenario of the present invention, the value of N is usually related to the maximum number of collisions C, and N satisfies the following condition: n = MAX (C/5,2), i.e. taking the maximum of C/5 and 2. In the present embodiment, the value of X is typically set to N × 2 to N × 3.
When the record of the key word K is to be stored, if the ArrayCount value in the jth Hash bucket is determined to be equal to X, after the contents of each conflict data recorded in the jth Hash bucket are matched, if a consistent result is not matched, directly discarding the corresponding key word K.
With reference to the embodiments of the present invention, there is also a preferred implementation scheme, which can further improve data indexing efficiency, and this implementation scheme is generally applicable to a clear time limit existing in conflict data analysis, and the conflict array is also recorded in an assigned index document, where the assigned index document performs file handle division according to a time interval, specifically:
index documents containing different conflict arrays are loaded into the memory when the current time is matched with the time interval associated in the file handle;
and if the current time reaches the other end of the association time of the index document loaded in the memory at present, searching the file handle matched with the current time, and then loading the index document corresponding to the corresponding file handle.
The index file is stored in the physical partition according to time, so that the system needs to manage the index file handles of different partitions.
The system of the invention has certain requirements on the time sequence of the original data to be indexed: out-of-order packets with times greater than "time zone size/2" are not allowed to occur, and such data is discarded if it occurs.
Based on the premise, the system only needs to keep 2 index file handles. The specific execution logic:
1) Assume that the time zone in which the system is implemented spans 1 hour.
2) When the current time is 1.
3) This time, it is allowed to index and store the original data whose time stamp is in the range of [0, 00, 2.
4) When the current time is 1. At this point the system maintains file handles of 1 and 2 points.
5) This time, it is allowed to index and store the original data whose time stamp is in the range of [ 100, 3.
6) And so on. The file handle validity sample is as follows:
Figure BDA0002698725870000071
Figure BDA0002698725870000081
under the logic, the system can be ensured to only keep and open the handles of 2 time zones, the occupation of system resources is reduced, and higher time disorder fault tolerance rate can be ensured. And respectively managing 2 handles by a double-thread/process mode can ensure that the initialization of the index file can be simultaneously carried out when the handle is established for the next time zone (generally, the initialization operation of the file is time-consuming in price comparison, and if the initialization of the index file of the current time zone is triggered after the data of the current time zone arrives, a large amount of instantaneous overstocked index files and loss of the index files can be caused.
Example 2:
fig. 6 is a schematic diagram of an architecture of a storage apparatus supporting a multi-valued index file according to an embodiment of the present invention. The storage device supporting the multi-value index file of the present embodiment includes one or more processors 21 and a memory 22. In fig. 6, one processor 21 is taken as an example.
The processor 21 and the memory 22 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.
The memory 22, which is a nonvolatile computer-readable storage medium, may be used to store a nonvolatile software program and a nonvolatile computer-executable program, such as the storage method supporting the multi-value index file in embodiment 1. The processor 21 executes the storage method supporting the multi-value index file by executing the nonvolatile software program and instructions stored in the memory 22.
The memory 22 may include high speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory 22 may optionally include memory located remotely from the processor 21, and these remote memories may be connected to the processor 21 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The program instructions/modules are stored in the memory 22, and when executed by the one or more processors 21, perform the storage method supporting the multi-value index file in the above-described embodiment 1, for example, perform the respective steps shown in fig. 3 described above.
It should be noted that, for the information interaction, execution process and other contents between the modules and units in the apparatus and system, the specific contents may refer to the description in the embodiment of the method of the present invention because the same concept is used as the embodiment of the processing method of the present invention, and are not described herein again.
Those of ordinary skill in the art will appreciate that all or part of the steps of the various methods of the embodiments may be implemented by associated hardware as instructed by a program, which may be stored on a computer-readable storage medium, which may include: read Only Memory (ROM), random Access Memory (RAM), magnetic or optical disks, and the like.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (9)

1. A storage method supporting multi-valued index files is characterized in that a collision array with continuous global memory is applied, and N data blocks used for storing data collision are distributed for each hash bucket; setting the maximum number of data blocks allowed to be accessed as X in each hash bucket; wherein, the value of X satisfies: x > = N, and statistics of X all start from the initial data block of the corresponding hash bucket; the data block identification Index in the corresponding use range corresponding to the ith hash bucket satisfies the following conditions: (N × i) < = Index < (N × (i + 1)); setting a field ArrayCount corresponding to the data management structure of each hash bucket, wherein the field ArrayCount is used for recording the number of data blocks actually used in the hash bucket; when the record of the key word K is to be stored, the method comprises the following steps:
calculating the serial number value of the hash bucket to be j through a preset hash algorithm;
when the hash bucket j conflicts, matching is started from the first data block in the hash bucket j, and matching of ArrayCount data blocks recorded by a data management structure in the hash bucket j is completed;
if a consistent data block is found in the matching process, directly using the record item of the data block, and ending the matching process; if not, a new data block is required to be applied, and then a data block allocation flow is entered:
in the data block distribution process, searching for an idle data block from ArrayCount to X in the data block distribution process; and if N + j + ArrayCount > N, borrowing and storing the next adjacent hash bucket j +1 of the hash bucket j, and synchronously updating the ArrayCount value in the hash bucket j +1 data management structure.
2. The storage method supporting the multi-Value index file according to claim 1, wherein the data stored in the data blocks establishes a data index chain in a reverse index manner, wherein each data block comprises an address pointer pprv pointing to the previously stored conflicting data and a Value storing the content of its own conflicting data.
3. The storage method supporting multi-valued index file according to claim 2, characterized in that in a hash bucket, the address pointer of the data block at the end of the data index chain is stored in its data management structure, and every time collision data is newly added in the corresponding hash bucket, the pPrev in the data block for bearing newly added collision data is assigned as the address pointer stored in the data management structure, and the Value in the data block for bearing newly added collision data is assigned as the collision data content; and updating the pointer stored in the data management structure to be the address pointer of the data block for bearing the newly increased conflict data.
4. The storage method supporting the multi-value index file according to any one of claims 1 to 3, wherein the total number H of the hash buckets is 2 to 5 times of the total number of the storage keywords, so as to ensure that a collision rate and memory occupation are optimal.
5. The storage method supporting the multi-value index file according to any one of claims 1 to 3, wherein the value of N is related to a maximum conflict number C, and N satisfies the following condition: n = MAX (C/5,2).
6. The storage method supporting the multi-valued index file according to any of claims 1-3, wherein the value of X is set to be N X2 to N X3.
7. The method according to claim 1, wherein when a record of a keyword K is to be stored, if it is determined that the size of the ArrayCount value in the jth hash bucket is equal to X, after the matching of each content of conflicting data recorded in the jth hash bucket is completed, if no matching result is obtained, the corresponding keyword K is directly discarded.
8. The storage method supporting the multi-value index file according to claim 1, wherein the conflict array is further recorded in a designated index document, wherein the designated index document performs file handle division according to time intervals, specifically:
index documents containing different conflict arrays are loaded into the memory when the current time is matched with the time interval associated in the file handle;
and if the current time reaches the other end of the association time of the index document loaded in the memory at present, searching the file handle matched with the current time, and then loading the index document corresponding to the corresponding file handle.
9. A storage apparatus supporting a multi-valued index file, the apparatus comprising:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor for performing the method of supporting multi-valued index file storage of any of claims 1-8.
CN202011014922.XA 2020-09-24 2020-09-24 Storage method and device supporting multi-valued index file Active CN112199333B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011014922.XA CN112199333B (en) 2020-09-24 2020-09-24 Storage method and device supporting multi-valued index file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011014922.XA CN112199333B (en) 2020-09-24 2020-09-24 Storage method and device supporting multi-valued index file

Publications (2)

Publication Number Publication Date
CN112199333A CN112199333A (en) 2021-01-08
CN112199333B true CN112199333B (en) 2022-11-22

Family

ID=74016133

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011014922.XA Active CN112199333B (en) 2020-09-24 2020-09-24 Storage method and device supporting multi-valued index file

Country Status (1)

Country Link
CN (1) CN112199333B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112800057B (en) * 2021-01-22 2023-06-09 新华三大数据技术有限公司 Fingerprint table management method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101753445A (en) * 2009-12-23 2010-06-23 重庆邮电大学 Fast flow classification method based on keyword decomposition hash algorithm
CN102541968A (en) * 2010-12-31 2012-07-04 百度在线网络技术(北京)有限公司 Indexing method
CN102609487A (en) * 2012-01-20 2012-07-25 东华大学 Column-storage-oriented Hash joint method for indexes in barrels
CN103064906A (en) * 2012-12-18 2013-04-24 华为技术有限公司 File management method and device
CN107515901A (en) * 2017-07-24 2017-12-26 中国科学院信息工程研究所 A kind of chain type daily record storage organization and its Hash Index Structure, data manipulation method and server, medium
US20200226099A1 (en) * 2019-01-11 2020-07-16 Jyothi Vemulapalli Method and apparatus for improving hash searching throughput in the event of hash collisions

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003030040A (en) * 2001-07-12 2003-01-31 Nec Commun Syst Ltd Hush indexes of object database system and non-unique index management system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101753445A (en) * 2009-12-23 2010-06-23 重庆邮电大学 Fast flow classification method based on keyword decomposition hash algorithm
CN102541968A (en) * 2010-12-31 2012-07-04 百度在线网络技术(北京)有限公司 Indexing method
CN102609487A (en) * 2012-01-20 2012-07-25 东华大学 Column-storage-oriented Hash joint method for indexes in barrels
CN103064906A (en) * 2012-12-18 2013-04-24 华为技术有限公司 File management method and device
CN107515901A (en) * 2017-07-24 2017-12-26 中国科学院信息工程研究所 A kind of chain type daily record storage organization and its Hash Index Structure, data manipulation method and server, medium
US20200226099A1 (en) * 2019-01-11 2020-07-16 Jyothi Vemulapalli Method and apparatus for improving hash searching throughput in the event of hash collisions

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
内存计算环境下基于索引结构的内存优化策略;英昌甜 等;《新疆大学学报(自然科学版)》;20180129;第35卷(第1期);第13-21页 *
高维分布式局部敏感哈希索引方法;林朝晖 等;《计算机科学与探索》;20130528;第7卷(第9期);第811-817页 *

Also Published As

Publication number Publication date
CN112199333A (en) 2021-01-08

Similar Documents

Publication Publication Date Title
CN101217571B (en) Write/read document operation method applied in multi-copy data grid system
CN109213772A (en) Date storage method and NVMe storage system
CN102446139B (en) Method and device for data storage
US11210228B2 (en) Method, device and computer program product for cache management
US8190857B2 (en) Deleting a shared resource node after reserving its identifier in delete pending queue until deletion condition is met to allow continued access for currently accessing processor
CN110704214B (en) Inter-process communication method and device
US20210019257A1 (en) Persistent memory storage engine device based on log structure and control method thereof
CN112632069B (en) Hash table data storage management method, device, medium and electronic equipment
US20030121030A1 (en) Method for implementing dual link list structure to enable fast link-list pointer updates
CN109766318B (en) File reading method and device
CN109240607B (en) File reading method and device
CN103425435A (en) Disk storage method and disk storage system
CN112199333B (en) Storage method and device supporting multi-valued index file
CN115129621A (en) Memory management method, device, medium and memory management module
CN116166690A (en) Mixed vector retrieval method and device for high concurrency scene
CN102724301B (en) Cloud database system and method and equipment for reading and writing cloud data
CN103207866A (en) File storing method and system based on partitioning strategies
CN114327642A (en) Data read-write control method and electronic equipment
WO2016187975A1 (en) Internal memory defragmentation method and apparatus
CN117271531A (en) Data storage method, system, equipment and medium
CN116662019A (en) Request distribution method and device, storage medium and electronic device
CN111124313A (en) Data reading and writing method and device for power acquisition terminal and electronic equipment
CN110334251B (en) Element sequence generation method for effectively solving rehash conflict
US10067690B1 (en) System and methods for flexible data access containers
CN117193674B (en) Method and device for improving mass data access efficiency of Internet of things equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant