CA3126012A1 - Method and system for content agnostic file indexing - Google Patents

Method and system for content agnostic file indexing Download PDF

Info

Publication number
CA3126012A1
CA3126012A1 CA3126012A CA3126012A CA3126012A1 CA 3126012 A1 CA3126012 A1 CA 3126012A1 CA 3126012 A CA3126012 A CA 3126012A CA 3126012 A CA3126012 A CA 3126012A CA 3126012 A1 CA3126012 A1 CA 3126012A1
Authority
CA
Canada
Prior art keywords
chunks
binary data
chunk
data file
pregenerated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CA3126012A
Other languages
French (fr)
Inventor
Christopher Mcelveen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lognovations Holdings LLC
Original Assignee
Lognovations Holdings LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US16/244,332 external-priority patent/US11138152B2/en
Application filed by Lognovations Holdings LLC filed Critical Lognovations Holdings LLC
Publication of CA3126012A1 publication Critical patent/CA3126012A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3084Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
    • H03M7/3088Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing the use of a dictionary, e.g. LZ78
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/60General implementation details not specific to a particular type of compression
    • H03M7/6052Synchronisation of encoder and decoder

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A computer-implemented method for content-agnostic referencing of a binary data file, the method comprising: pregenerating a table of all permutations of data of a particular length, determining a length of the binary data file, the length comprising the number of bits of the binary data file; chunking the binary data into chunks of data of a smaller length; for each chunk, determining if the chunk is in the pregenerated table, and if so using that chunks index in the pregenerated table, and otherwise chunking the data again until the sub-chunks are located in the pregenerated table, and using the number of chunks and associated indices to indicate the binary data file.

Description

METHOD AND SYSTEM FOR CONTENT AGNOSTIC FILE INDEXING
CROSS REFERENCE TO RELATED APPLICATIONS
[ 0 1 1 This application is a continuation-in-part of application number 15/730,043, filed October 11, 2017, entitled "Method and System for Content Agnostic File Indexing," the contents of which are fully incorporated by reference herein for all purposes.
COMPUTER PROGRAM LISTING - SEQUENCE LISTING
[ 0 2 ] The following Computer Program Listing is submitted herewith and is incorporated by reference. Each of the respective files is incorporated by reference. The Computer Program Listing below is in the format of: <size in bytes> <date created> <file name>:
[ 0 0 3 ] 3864 May 16 2018 squeeze-master-README-md.txt*
[0041 83675 May 16 2018 squeeze-master-SqueezeReport-ipynb.txt*
[0051 4293 May 16 2018 squeeze-master-demo app-py.txt*
[ 0 0 6 ] 98 May 16 2018 squeeze-master-gitignore.txt*
[0071 1383 May 16 2018 squeeze-master-requirements.txt*
[0081 2490 May 16 2018 squeeze-master-rpc server-py.txt*
[ 0 9 ] 239 May 16 2018 squeeze-master-scripts-buildprotos.txt*
[0101 942 May 16 2018 squeeze-master-scripts-file test-py.txt*
[ 1 1 1 1391 May 16 2018 squeeze-master-scripts-generate key-py.txt*
[ 0 1 2 ] 711 May 16 2018 squeeze-master-scripts-generate keyset-py.txt*
[0131 629 May 16 2018 squeeze-master-scripts-keys from folder-py.txt*
[ 1 4 ] 377 May 16 2018 squeeze-master-scripts-lzw test-py.txt*
[ 0 1 5 ] 107 May 16 2018 squeeze-master-scripts-runserver.txt*

[016] 3928 May 16 2018 squeeze-master-scripts-squeeze-bytes-report-py.txt*
[017] 63 May 16 2018 squeeze-master-scripts-squeeze file-py.txt*
[018] 1060 May 16 2018 squeeze-master-scripts-squeeze test-py.txt*
[019] 947 May 16 2018 squeeze-master-scripts-string test-py.txt*
[020] 222 May 16 2018 squeeze-master-scripts-test binary-py.txt*
[021] 1799 May 16 2018 squeeze-master-scripts-test rpc-py.txt*
[022] 2736 May 16 2018 squeeze-master-scripts-time-squeeze-string-py.txt*
[023] 211 May 16 2018 squeeze-master-scripts-time keygen-py.txt*
[024] 65 May 16 2018 squeeze-master-scripts-unsqueeze file-py.txt*
[025] 80 May 16 2018 squeeze-master-setup-py.txt*
[026] 10657 May 16 2018 squeeze-master-squeeze- init -py.txt*
[027] 2783 May 16 2018 squeeze-master-squeeze-bitstring-py.txt*
[028] 9191 May 16 2018 squeeze-master-squeeze-keys-py.txt*
[029] 613 May 16 2018 squeeze-master-squeeze-performance-csv.txt*
[030] 22445 May 16 2018 squeeze-master-squeeze-squeeze_pb2-py.txt*
[031] 2232 May 16 2018 squeeze-master-squeeze-squeeze_pb2 grpc-py.txt*
[032] 3366 May 16 2018 squeeze-master-squeeze-proto.txt*
[033] 875 May 16 2018 squeeze-master-templates-layout-html.txt*
[034] 816 May 16 2018 squeeze-master-templates-upload form-html.txt*
[035] 1513 May 16 2018 squeeze-master-templates-uploaded file-html.txt*
[036] 200 May 16 2018 squeezerpc-master-Makefile.txt*
[037] 1131 May 16 2018 squeezerpc-master-README-md.txt*
[038] 7 May 16 2018 squeezerpc-master-gitignore.txt*
2 [039] 8995 May 16 2018 squeezerpc-master-main-go.txt*
[040] 21292 May 16 2018 squeezerpc-master-squeeze-squeeze-pb-go.txt*
[041] 3366 May 16 2018 squeezerpc-master-squeeze-proto.txt*
TECHNICAL FIELD
[042] This disclosure relates to a method for content agnostic file referencing.
The method may further relate to a method for content agnostic data compression.
BACKGROUND OF THE INVENTION
[043] File referencing techniques generally require knowledge about the kind of data being stored in order to efficiently index the data in a file referencing system. Similarly, knowledge about the data at issue is also generally used in creating improved compression approaches to reduce data size for transmission, storage, and the like.
[044] There exists a need in the industry to improve file referencing and data compression techniques to reduce the amount of data that must be stored and/or transmitted.
SUMMARY OF THE INVENTION
[045] According to one embodiment, this disclosure provides a method for improving computing technology with an enhanced content-agnostic file referencing system.
The method improves the operation of the computer itself.
[046] The disclosed method has several important advantages. For example, the disclosed method permits file referencing of any content type.
[047] The disclosed method additionally permits a significant reduction in the amount of information or data that must be persisted or transmitted, as data may be generated at access time as opposed to persisted.
3 [ 0 48] Various embodiments of the present disclosure may have none, some, or all of these advantages. Other technical advantages of the present disclosure may also be readily apparent to one skilled in the art.
BRIEF DESCRIPTION OF THE DRAWINGS
[049] For a more complete understanding of the present disclosure and its advantages, reference is now made to the following descriptions, taken in conjunction with the accompanying drawings, in which:
[050] FIG. 1 is a flowchart outlining the steps of one embodiment of the present disclosure.
[051] FIG. 2 is another flowchart outlining the steps of another embodiment of the present disclosure.
[052] FIG. 3 is a flowchart outlining the steps of an alternate embodiment of the present disclosure.
[053] Similar reference numerals refer to similar parts or steps throughout the several views of the drawings.
DETAILED DESCRIPTION OF THE EMBODIMENTS
[054] The present disclosure relates to a method for content-agnostic indexing of data. The method may be used for a variety of computer-specific needs, including for example as a file referencing system or a compression system.
[055] The disclosure below describes the invention in connection with compression of binary data as exemplary, but the teachings work as well with any type of data, better termed "n-ary" data. For example, the method and system also works with qubits and bits.
4 [056] One embodiment of the present invention comprises a method as described in the flow chart depicted in FIG. 1. Binary data (n1) (for instance, a data file) to be persisted or transmitted is analyzed to determine its length in bits (1(0).
Using this information, at step 106, the method calculates all permutations of data of the identified length. For example, if the input data is:

[057] then the input data is 2-bits long. At step 106, all permutations of 2-bits will be generated, namely:
{00} {01} {10} {11}
[058] At step 108, the method determines the index (n f) of the input binary data file in the generated permutations. Using the example above, the index (n f) returned would be "1". Finally, rather than storing or transmitting the input binary data (i.e.
"01"), the system instead stores the length (2) and the index (1).
[059] When the need comes to decode the original input data (for instance, a request to retrieve the original binary data from disk, or receipt of the transmitted data across a network), the method needs only a length (1(0) and an index (n f) as input.
Using the above example, the input provided would be the length (2) and the index (1). As shown in FIG 2, the system calculates all permutations of the inputted length. As above, that would generate the following permutations:
{00} {01} {10} {11}
[ 060 ] The system would then go to the provided index (1 in the above example) and return the permutation. Again, using the above example, this would return "01" the original binary data.

[ 0 611 The above method has been described for purposes of example in terms of a binary system (i.e. the input data is binary data). The method and system work similarly for n-ary systems. While the binary system describes above works essentially in the Euclidean plane, with n-ary data Hilbert spaces conceptually provide the same advantages. The method and process can be generalized for n-ary data per below:
dAn = p(i) (dAn)n = p(f) d = order of the system n = length in appropriate n-ary units respective to the order of the system p(i) = initial index p(f) = final index Order of Visual Reference Search Pattern System Representation Key (d) 1 String n / x Left to Right 2 Plane n / x/y Top Left to Bottom Right 3 3(fold) n / x/y/z Top Back Left to Bottom Front Right D(fold) n / x/y/z/... Top Back Left ... to Bottom Front Right [ 0 6 2 1 It should be noted that given two alternative ordered systems with the same input file, the system with the higher order will have a higher n-ary density relative to the alternative with a lesser ordered system.
[0631 An example of the method is disclosed in the following Ruby code snippets. The below snippet demonstrates a method as disclosed in FIG. 1:

class Input require 'securerandom' def create(k) input binary = SecureRandom.hex(k) end def clean(k) input string = create(k).unpack('B*').first.to s end def build(n) permutation = (0..2**n-1).map "%0#{n}b" % i end def self.kmp search(string, sub string) return nil if string.nil? or substring.nil?
pos = 2 cnd = 0 failure table = [-1, 0]
while pos < substring.length if substring[pos - 1] == substring[cnd]
failure table[pos] = cnd + 1 pos += 1 cnd += 1 elsif cnd > 0 cnd = failure table[cnd]
else failure table[pos] = 0 pos += 1 end end m = i = 0 while m + i < string.length if substring[i] == string[m + i]
i += 1 return m if i == substring.length else m = m + i - failure table[i]
i = failure table[i] if i > 0 end end return nil end def kmp search(substring) Input.kmp search(self, sub string) end end init = Input.new input = init.clean(1) depth = input.length generate = init.build(depth) steps = generate.join.to s step = Input.kmp search("#{ steps} " ,"#{input} ") p input p depth p step [ 064 ] The below snippet demonstrates a method as disclosed in FIG.
2, using an input length (1(0) of 16 and an index (n f) of 72,629:
class Output def build(n) permutation = (0..2**n-1).map "%0#{n}b" % i end end depth = 16 step = 72629 init = Output.new create = init.build(depth) interpret = create.join.to s compute = (depth + step) - 1 output = interpret[step..compute].gsub(As\w+$/,'...') p output [0651 In a preferred embodiment, an input byte string is converted into a bit string corresponding to a representation of the input byte string. This bit string is what is then processed through the method described herein.
[ 0 66 1 In an alternative embodiment, rather than generate the table based on the length of the data, a table may be pregenerated with all permuations of data of a particular length. This pregenerated table may be persisted in memory, either non-volatile or volatile memory. Using the above example, if the predetermined length is 2-bits, the pregenerated table will include all permutations of 2-bit data, such as {00} {01} {10} {11}
[ 0 6 7 1 In one embodiment, this table may be stored in an array with corresponding indices as follows:
{00} {01} {10} {11}

[ 0 68 1 This pregenerated table may be stored on disk, in RAM, or otherwise.
Preferably, this pregenerated table is stored with the computing system that reduces file size (or squeezes a file) as well as the computing system that expands a reduced file (or unsqueezes the data).
[ 0 69 1 Upon receiving input data, the method "chunks" the data into smaller subsets of data. In the context used herein, "chunk" means to take a data string and create smaller data strings comprising subsets of the larger data string. All chunks together would form the original data string. For example, if the input data is:

[ 070 ] It may be chunked into 4-bit chunks as follows:

[ 071 1 Each individual chunk will then be compared to the pregenerated table to see if there is a match. Using the above example, with chunk sizes of 4-bits, each chunk will not be found in the table as the table has permutations for all 2-bit chunks.
Thus, each chunk will be chunked again, resulting in the following:

[ 072 ] The method will continue for each chunk until a point where the particular chunk is located in the pregenerated table. At that point, the chunk will be associated with its respective index, and preferably a series of tuples will be generated indicating the chunk level and the corresponding index. In the above example, the system chunked twice, so the index association will be as follows:
{2, 1} {2, 2} {2, 1} {2, 3} {2, 0} {2, 1}
[ 073 ] In this example, the original input data "011001110001" was eventually broken into six (6) chunks, each of 2-bit length. As shown, each chunk is represented with a chunk level (2) and corresponding index into the pregenerated table.
[ 074 ] The data may be chunked in any number of ways. For instance, the data may be chunked based on a pre-determined size as in the above example (where the predetermined size was 4-bits for purposes of example). Alternatively, the input data may be recursively chunked into 2 separate data chunks, until each data chunk may be found in the pregenerated table. Using the same input data as above, a method of chunking the data by splitting it would result in the following first level chunk:

[ 075 ] Here, the data sets are not found in the pregenerated table, so they are chunked again:

[076] Again, the chunked data is not found in the pregenerated table, so it must be chunked again:

[077] Notably, some segments are chunked into data smaller than the pregenerated table size (i.e. segments "1", "1", "0", and "1"). These segments may be padded in order to compare them to the pregenerated table. The numbers may be stored either using big endian or little endian byte order, so long as consistency is maintained.
Using big endian byte order, for example, the chunked data above would be represented as:

[078] The method would then continue as it did above.
[079] It is not required that all chunks of data be found in the pregenerated table at the same data chunk level. For example, using the above pregenerated table for 2-bit combinations, if the input data is:

[080] The data may be originally chunked like above, by breaking it into 4 bit sequences:

[ 0 8 1 1 Like above, the first two 4-bit sequences (i.e. "0110" and "0111") must be chunked again into smaller chunks in order to be located in the pregenerated table, resulting in the following chunks:

[082] And as above, the chunks will be associated with their chunk level and corresponding index as follows:
{2, 1} {2, 2} {2, 1} {2, 3} {1, 0}
[083] Note the last tuple above indicates a chunk level of 1, as that chunk did not require a second round of chunking.
[084] Once the input data is reduced to a series of chunk levels and indices, that series of chunk level and indices is used to identify the original data. The association may be stored as a series of tuples, as a separate bit string, and otherwise.
[085] To recreate (or unsqueeze) the data based on a series of chunk levels and indices, the process works in reverse. Again, the system must have the same pregenerated table.
For each tuple of chunk level and index, the system consults the pregenerated table to unpack the squeezed chunk and return it to its original data.
[086] This alternate embodiment is shown in the flowchart of FIG. 3. First, a Pregenerated Table comprising all permutations of data of a particular length is created at step 302. As indicated above, preferably that table is persisted in some fashion.
Next, the system receives input data to be squeezed at step 304. The process then chunks the data into smaller segments until the data length is of a length that would be located in the Pregenerated Table at steps 306 and 308. As indicated above, the process maintains the chunk level so that the system knows how many times an input data set has been chunked. Each chunk is then located in the Pregenerated Table at step 310. Finally, the chunk, its chunk level, and the respective index in the Pregerated Table is associated, resulting in the squeezed data at step 312.
[ 0 8 7 ] Although this disclosure has been described in terms of certain embodiments and generally associated methods, alterations and permutations of these embodiments and methods will be apparent to those skilled in the art.
Accordingly, the above description of example embodiments does not constrain this disclosure. Other changes, substitutions, and alterations are also possible without departing from the spirit and scope of this disclosure.

Claims (19)

WHAT IS CLAIMED IS:
1. A computer-implemented method for content-agnostic referencing of a binary data file, the method comprising:
pregenerating a table using an input seed wherein the table comprises all permutations of bits of a predetermined length;
determining a length of the binary data file, the length comprising the number of bits of the binary data file;
chunking the binary data file into substrings wherein each substring is of a length smaller than the length of the binary data file;
for each chunk of the binary data file, determining if the chunk is in the pregenerated table, wherein if the chunk is in the pregenerated table, associating the chunk with an index of the location of the chunk in the pregenerated table, and wherein if the chunk is not in the pregenerated table, further chunking the chunked binary data into smaller chunks; and using the number of chunks and associated indices of all chunks to indicate the binary data file.
2. The method of Claim 1, wherein using the number of chunks and associated indices of all chunks to indicate the binary data file comprises:
persisting on a storage device the number of chunks and associated indices of all instead of the binary data file.
3. The method of Claim 1, using the number of chunks and associated indices of all chunks to indicate the binary data file comprises:
Transmitting the number of chunks and associated indices of all chunks instead of the data file.
4. The method of Claim 3 wherein transmitting transmits the number of chunks and associated indices of all chunks on a network.
5. The method of Claim 3 wherein transmitting transmits the number of chunks and associated indices of all chunks on a bus.
6. The method of Claim 1 wherein using the number of chunks and associated indices of all chunks to indicate the binary data file comprises:
creating a tuple of ordered pairs wherein each ordered pair indicates a chunk level and an associated index
7. The method of Claim 1 wherein using the number of chunks and associated indices of all chunks to indicate the binary data file comprises persisting the number of chunks and associated indices of all chunks on a storage device.
8. The method of Claim 7 wherein the storage device is a disk.
9. The method of Claim 1 wherein the pregenerated table is a hash table.
10. The method of Claim 1 wherein the pregenerated table is an array.
11. The method of Claim 1 wherein the pregenerated table is persisted in volatile memory.
12. The method of Claim 1 wherein the pregenerated table is persisted in non-volatile memory.
13. The method of Claim 1 wherein chunking the binary data file into substrings further comprises:
Chunking the binary data into chunks of a predetermined length.
14. The method of Claim 13 wherein the predetermined length is 2 megabytes.
15. The method of Claim 13 wherein the predetermined length is smaller than megabytes.
16. The method of Claim 13 wherein the predetermined length is larger than megabytes.
17. The method of Claim 1 wherein chunking the binary data file into substrings further compri ses:
recursively splitting the binary data file into 2 chunks of equal size.
18. A method of retrieving data based on a number of chunks and associated indices of all chunks, the method comprising pregenerating a table using an input seed wherein the table comprises all permutations of bits of a predetermined length, wherein the pregenerated table was used to generate the number of chunks and associated indices; and for each chunk, locating data in the table at the index associated with the chunk; and returning the data associated with each chunk.
19. The method of claims 18 wherein returning the data associated with each chunk compri ses :
Concatenating the data associated with each chunk into a single bitstream.
CA3126012A 2019-01-10 2020-01-08 Method and system for content agnostic file indexing Pending CA3126012A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US16/244,332 2019-01-10
US16/244,332 US11138152B2 (en) 2017-10-11 2019-01-10 Method and system for content agnostic file indexing
PCT/US2020/012661 WO2020146448A1 (en) 2019-01-10 2020-01-08 Method and system for content agnostic file indexing

Publications (1)

Publication Number Publication Date
CA3126012A1 true CA3126012A1 (en) 2020-07-16

Family

ID=71520909

Family Applications (1)

Application Number Title Priority Date Filing Date
CA3126012A Pending CA3126012A1 (en) 2019-01-10 2020-01-08 Method and system for content agnostic file indexing

Country Status (6)

Country Link
EP (1) EP3908937A4 (en)
JP (1) JP2022518194A (en)
KR (1) KR20210110875A (en)
AU (1) AU2020205970A1 (en)
CA (1) CA3126012A1 (en)
WO (1) WO2020146448A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11138152B2 (en) 2017-10-11 2021-10-05 Lognovations Holdings, Llc Method and system for content agnostic file indexing

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5594435A (en) * 1995-09-13 1997-01-14 Philosophers' Stone Llc Permutation-based data compression
US7882139B2 (en) * 2003-09-29 2011-02-01 Xunlei Networking Technologies, Ltd Content oriented index and search method and system
US20050071151A1 (en) * 2003-09-30 2005-03-31 Ali-Reza Adl-Tabatabai Compression-decompression mechanism
CN1868127B (en) * 2003-10-17 2011-06-22 佩茨拜特软件有限公司 Data compression system and method
US8510459B2 (en) * 2006-09-01 2013-08-13 Pacbyte Software Pty Limited Method and system for transmitting a data file over a data network
US8533166B1 (en) * 2010-08-20 2013-09-10 Brevity Ventures LLC Methods and systems for encoding/decoding files and transmission thereof
US9639543B2 (en) * 2010-12-28 2017-05-02 Microsoft Technology Licensing, Llc Adaptive index for data deduplication
US11138152B2 (en) * 2017-10-11 2021-10-05 Lognovations Holdings, Llc Method and system for content agnostic file indexing

Also Published As

Publication number Publication date
KR20210110875A (en) 2021-09-09
WO2020146448A1 (en) 2020-07-16
EP3908937A1 (en) 2021-11-17
AU2020205970A1 (en) 2021-08-05
JP2022518194A (en) 2022-03-14
EP3908937A4 (en) 2022-09-28

Similar Documents

Publication Publication Date Title
US11138152B2 (en) Method and system for content agnostic file indexing
US11899641B2 (en) Trie-based indices for databases
US20220093210A1 (en) System and method for characterizing biological sequence data through a probabilistic data structure
US8554561B2 (en) Efficient indexing of documents with similar content
US10680645B2 (en) System and method for data storage, transfer, synchronization, and security using codeword probability estimation
KR20130062889A (en) Method and system for data compression
US20050187898A1 (en) Data Lookup architecture
US10146817B2 (en) Inverted index and inverted list process for storing and retrieving information
US11899624B2 (en) System and method for random-access manipulation of compacted data files
US11544225B2 (en) Method and system for content agnostic file indexing
CA3126012A1 (en) Method and system for content agnostic file indexing
Lou et al. Data deduplication with random substitutions
CN112416879B (en) NTFS file system-based block-level data deduplication method
US20220245097A1 (en) Hashing with differing hash size and compression size
JP6291435B2 (en) Program and cluster system
US11995060B2 (en) Hashing a data set with multiple hash engines
US20220245104A1 (en) Hashing for deduplication through skipping selected data
Vaddeman et al. Data formats
US20240202166A1 (en) Generating compressed column slabs for storage in a database system
Нікітін et al. Modification of hashing algorithm to increase rate of operations in nosql databases
Nikitin et al. Modification of hashing algorithm to increase rate of operations in NOSQL databases

Legal Events

Date Code Title Description
EEER Examination request

Effective date: 20231229