WO2006063057A2 - Application d'algorithmes a compression multiple dans un systeme de base de donnees - Google Patents

Application d'algorithmes a compression multiple dans un systeme de base de donnees Download PDF

Info

Publication number
WO2006063057A2
WO2006063057A2 PCT/US2005/044275 US2005044275W WO2006063057A2 WO 2006063057 A2 WO2006063057 A2 WO 2006063057A2 US 2005044275 W US2005044275 W US 2005044275W WO 2006063057 A2 WO2006063057 A2 WO 2006063057A2
Authority
WO
WIPO (PCT)
Prior art keywords
page
data
compression
storage
sub
Prior art date
Application number
PCT/US2005/044275
Other languages
English (en)
Other versions
WO2006063057A3 (fr
Inventor
James Ivie
Original Assignee
Agilix Labs
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agilix Labs filed Critical Agilix Labs
Publication of WO2006063057A2 publication Critical patent/WO2006063057A2/fr
Publication of WO2006063057A3 publication Critical patent/WO2006063057A3/fr

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Definitions

  • This invention relates to systems, methods, and computer program products for compressing data in a database system.
  • B-Trees or some other similar "page”-based structure to store collections of structured data.
  • B-Tree systems generally provide efficient methods to store and access large amounts of dynamic data on slow media, such as tape or hard disk ("sub-storage"). Data such as this is typically more data than would ordinarily fit in Random Access Memory (“RAM”)
  • RAM Random Access Memory
  • B-Tree systems make no assumption about what type of data is being stored, allowing the B-Tree systems to be flexible enough for most kinds of data.
  • B-Tree systems generally limit the data to "tables" where each item is stored a row, with its elements stored in columns (the set of columns being the same for all items in the table).
  • Each column is defined to contain a fixed size number or a string (either of a fixed size or of variable size).
  • compression algorithms remove redundancy in data, thus making the data smaller. This is generally desirable since storing the original version of the data on disk often takes longer than it takes to both compress the data and store the smaller or compressed version of the data on disk.
  • a number of different types of compression have been implemented to remove redundancy in data to provide such storage efficiencies.
  • compression can shrink the column data before it is put into the columns (i.e., "intra-row” compression), such as if the table system supports the type of data that is being put in, and is still able to sort the rows.
  • compression is utilized to shrink the size of the resulting pages (i.e., "inter-row” compression).
  • inter-row compression
  • intra-row compression involves applying compression before the values are entered into columns, since the compression works within a single row.
  • the storage savings from intra-row compression are minimal since most of the data redundancy in a page-based database is between the rows in a table, not within the rows.
  • intra- row compression or compression of several rows in a table, results in much better compression, but results in chunks of data that are of different sizes (i.e., each page started out the same size but the compression works differently on each one, resulting in different sizes). Since "inter-row” compression results in chunks of data that vary in size, inter-row compression is generally used with a sub-storage that supports storage and retrieval of variable-sized data.
  • variable-sized data chunks such as with inter-row compression
  • inter-row compression can result in significant performance degradation.
  • much of the space savings offered by inter-row compression is wasted by the sub-storage system as the sub- storage tries to compensate for having to support variable-sized chunks.
  • One example of a conventional inter-row compression system in a database is a database that uses a "symbol table".
  • a database system such as this looks for common values for each column, and only stores one version of that value in the symbol table, which is also stored in the same page.
  • the symbol table refers back to that value whenever the value occurs again in columns stored in the same page.
  • this type of compression is an example of inter-row compression, since the compression works by looking at values common in more than one row in the table.
  • the problem of variable-sized chunks of data is solved by applying the compression as the items are placed into the pages.
  • An example of intra-row compression includes one type of a full-text indexing system.
  • a full-text indexing system that uses gamma encoding may assume that smaller numbers are used much more frequently than large ones. The system then stores numbers with a variable number of bytes, where small numbers only take a small number of bytes, and large numbers take more bytes (even more than their corresponding normal fixed-width representation). Where the smaller numbers are represented more frequently in the indexing system, the gamma encoding can provide measurable space savings.
  • delta compression i.e., difference
  • This type of compression is sometimes used to store databases, such as one used to store a dictionary in as small a space as possible, where many of the data terms have at least some similarity.
  • a delta compression algorithm takes advantage of the fact that words in the dictionary, when stored in order, frequently start with a sequence of letters identical to the previous word in the list. For example, after "rabbi", the next word in the dictionary might be "rabbit”. The word “rabbit” could be stored represented as "5t”, indicating that the first 5 letters of this word are the same as in the previous word, but then adds the letter "t” to the end.
  • a database index using a B-Tree system might store the word "zoo", but then use a separate (non page-base) data stream to store the corresponding list of rows that the word "zoo” exists in, using delta compression (storing only the difference between numbers in an increasing sequence) and gamma compression (storing smaller numbers using less bytes).
  • delta compression storing only the difference between numbers in an increasing sequence
  • gamma compression storing smaller numbers using less bytes
  • an advantage in the art can be realized with systems, methods, and computer program products that efficiently combine the benefits of several compression algorithms into a single database system, while retaining the system's ability to efficiently make incremental changes to the data.
  • the present invention solves one or more of the foregoing problems in the prior art with database systems and methods that provide for the efficient use of multiple compression algorithms in a way that data can be compressed for significant space savings, and can be easily retrieved and read when needed.
  • implementations of the present invention provide for the efficient use of both intra-row and inter-row compression techniques in a database system using a page-based structure and a compression plug-in which facilitates access to data from the page based structure and writing of new data into sub storage in an efficient manner.
  • a request is received to access (i.e., add, delete, modify) data contained within a database page.
  • a compression plug-in retrieves the database page from sub-storage, allocates a page buffer based on a stored value indicating the page size when inter-row decompressed, and then inter-row decompresses the page into that page buffer.
  • the page data remains in intra-row compressed form within the page buffer; and any data added to the page buffer is added using intra-row compression techniques, such as gamma encoding.
  • the compression plug-in begins by compressing the data in the page buffer using inter-row compression.
  • the compression plug-in identifies if there is sufficient space in the page in sub storage to store the data in the page buffer. If there is sufficient space to store the intra-row and inter-row compressed data from the page buffer to the page in sub-storage, the compressed data from the page buffer is saved into the page in the sub-storage. If there is too much data to fit into the page in the sub-storage, the page buffer is split into one or more additional page buffers, as appropriate, and one or more corresponding fixed-size pages are also created in the sub-storage.
  • the compression plug-in then inter-row compresses each page buffer and writes the compressed data into the corresponding fixed-size pages in the sub-storage.
  • the compression plug-in is utilized to allocate the page buffer, access data from sub-storage, manage compression of data to and from sub-storage, and allocate new page buffers and pages in sub-storage as required, and inform the B-Tree or other row management system of the addition of new pages as a result of a page buffer split.
  • Utilizing the compression plug-in for such functionality provides a number of benefits.
  • the compression format can be changed, altered, or dynamically customized according to the type of underlying data to be stored without affecting the underlying storage format or row management system.
  • the compression plug-in facilitates the determination of the need to create additional pages in sub-storage without first attempting to write the data into sub-storage.
  • the use of a compression plug-in also allows an underlying B-Tree or other data storage structure to maintain the data in fixed size pages in sub-storage. By utilizing fixed size pages in sub- storage, optimal efficiency of the underlying storage format is maintained as new pages in sub-storage are created to accommodate additional data being written from page buffers.
  • implementations such as these in accordance with the present invention provide the ability to custom-tailor multiple types of compression for each data type being stored, while retaining fixed-size pages in sub-storage. Furthermore, implementations in accordance with the present invention provides these advantages without necessarily requiring any changes to the B-Tree (or other row management) system. Furthermore such implementations provide the ability to maintain an acceptable level of accessibility and modifiability in the database system.
  • Figure 1 is a block diagram of an illustrative system utilizing a compression plug-in to control access and storage of new data into sub-storage according to one embodiment of the present invention.
  • Figure 2 illustrates the manner in which the compression plug-in of Figure 1 is utilized to access data from sub-storage and utilize a page buffer to add new data to the page data.
  • Figure 3 illustrates the manner in which data is transferred from the page buffer to sub-storage utilizing the compression plug-in of Figure 1.
  • Figure 4 is a flow diagram illustrating the manner in which a compression plug-in transfers new data for storage in sub-storage according to one embodiment of the present invention.
  • Figure 5 is a flow diagram illustrating the manner in which the compression plug-in determines whether to create additional pages within sub-storage for transferring data to sub-storage.
  • Figure 6 is a block diagram illustrating the manner in which the compression plug-in utilizes additional page buffers to more efficiently write data to additional pages in sub-storage.
  • the present invention extends to systems and methods that provide for the efficient use of multiple compression algorithms in a way that data can be compressed for significant space savings, and can be easily retrieved and read when needed.
  • implementations of the present invention provide for the efficient use of both intra-row and inter-row compression techniques in a database system using a page-based structure and a compression plug-in which facilitates access to data from the page based structure and writing of new data into sub storage in an efficient manner.
  • the present invention can separate compression from both the sub-storage and the row management system, and thus balance saving space with accessibility and modifiability.
  • the compression plug-in facilitates the determination of the need to create additional pages in sub-storage without first attempting to write the data into sub-storage. Due to the inherent inefficiencies of transferring data to sub-storage, utilizing a compression plug-in to determine whether there is sufficient space in the page before transferring data to sub-storage results in substantial performance efficiencies.
  • Figure 1 is a block diagram of an illustrative system utilizing a compression plug-in to control access and transfer of data to and from sub-storage according to one embodiment of the present invention.
  • a system 10 is provided between sub-storage 12 and a buffer 14.
  • system 10 is operably linked to sub-storage 12 and buffer 14 such that page data can be accessed from sub-storage 12 for the addition of data into the page in sub-storage 12.
  • information in a page in sub-storage 12 corresponds with information stored in a database. In the event that additional data needs to be added to the database, the page corresponding with the information to be stored is accessed from sub-storage 12.
  • the page data in sub-storage 12 is compressed for efficient storage of the page data in the underlying storage format (i.e. B-Tree data structures).
  • the page data accessed from sub-storage 12 is at least partially decompressed and sent to buffer 14.
  • the data is stored in buffer 14 allowing the new data to be added to the page data as appropriate.
  • a compression plug-in 16 is provided in comiection with system 10.
  • Compression plug-in 16 provides compression and decompression of data.
  • Compression plug-in 16 controls access of page data from sub-storage 12 including providing decompression of page data being accessed from sub-storage 12. Additionally, compression plug-in allocates buffer 14 for data transferred from sub-storage 12 including providing transmission of decompressed data to page buffer 14.
  • Compression plug-in 16 also facilitates management of system 10, including the row manager, allowing for compression of new data being added to buffer 14.
  • decompressed page data 18 accessed from sub-storage 12 is provided to buffer 14 utilizing compression plug-in 16. Subsequent to the addition of new data from system 10 to buffer 14, compression plug-in 16 facilitates compression of the data in buffer 14 for storage in sub-storage 12. Compression plug-in 16 then transmits compressed page data 20 from buffer 14 into sub-storage 12.
  • compression plug-in 16 By applying compression using compression plug-in 16 while the data is being added to the pages used by the row management system, the present invention can separate the compression from both sub-storage 12 and the row management system of system 10, and thus balance saving space with accessibility and modif ⁇ ability. The balance needed for each data type may be different.
  • compression plug-in 16 allows for changing of the compression algorithm, without the changing underlying row management or sub-storage systems, providing the ideal balance between compression and modifiability for each type of data that the system stores, without necessarily requiring modification of the underlying systems.
  • compression plug-in 16 can be configured for use with any traditional B-Trees, B+Trees, B*Trees, Binary Trees, N-way Trees, Database Tables, Hash-Trees, or any other page-based storage system, with little modification, and without affecting the system's ability to decide on what pages data should be stored.
  • Figure 2 illustrates the manner in which compression plug-in 16 is utilized to access data from sub-storage 12 and utilize Page A Buffer 14a to add new data.
  • system 10 loads a page (i.e. Page A) and the corresponding data (i.e. Page A Data) from the sub-storage 12.
  • Compression plug-in 16 creates a corresponding Page A Buffer 14a that is larger than the data loaded from sub-storage 12.
  • the data in sub-storage 12 i.e. Page A Data
  • the compression plug-in 16 provides inter-row decompression of the data from sub-storage 12 while leaving the data in intra-row compression.
  • the larger size of Page A Buffer 14a is used to store a version of the intra-row compressed data (i.e. Page A data in Page A Buffer 14a).
  • New data 22 to be added to Page A data in Page A Buffer 14a is provided in connection with compression plug-in 16.
  • compression plug-in 16 applies intra-row compression to the new data to resulting in intra-row compressed new data 24.
  • Compression plug- in 16 operates in connection with a row manager to determine the juxtaposition of the new data 24 relevant to the existing page data in Page A Buffer 14a.
  • Compression plug-in 16 then inserts the new data into Page A Buffer 14a (expanding Page A Buffer 14a if needed).
  • the compression plug-in operates in connection with the row manager before intra-row compression of the data.
  • the compression plug-in compresses the page data independent of the row manager and subsequently the row manager adds the intra-row compressed data in the page buffer without the use of the compression plug-in.
  • the data from sub- storage is completely decompressed before addition to the page buffer.
  • the data in sub-storage is compressed with a single compression algorithm (such as index compression) and the compression plug-in is utilized to control the addition of new data into the sub- storage in the single compression format.
  • Figure 3 illustrates the manner in which data is written from page buffer 14 to sub-storage 12 utilizing compression plug-in 16.
  • page buffer 14 contains the original data inter-row decompressed from sub-storage 12 (i.e. Page A Data) plus the new data intra-row compressed from compression plug-in 16 (i.e. New Page A Data).
  • compression plug-in 16 identifies that the data in page buffer 14 is ready to be sent to sub-storage 12.
  • compression plug-in 16 then applies inter-row compression to both the Page A Data and the New Page A Data.
  • inter-row compression can include delta-row compression or other known inter-row compression algorithms. Because, the data in page buffer 14 was stored in intra-row compression, the additional inter-row compression provided by compression plug-in 16 results in both intra-row and inter- row compression of the data from page buffer 14. The compressed page data from page buffer 14, including the new page data, is then sent to a page in sub-storage 12 corresponding with page buffer 14. Subsequent to transmission of the data .from compression plug-in 16 to sub-storage 12, the data is stored in sub-storage 12.in both an intra-row compressed and inter-row compressed format. This provides compression benefits of inter-row compression while maintaining fixed sized bits of data that allows for optimized accessibility, modifiability, and overall system performance.
  • index data can also benefit greatly.
  • traditional SQL databases use B-Trees to index data stored in tables.
  • One of the most complicated (and space-consuming) indexes in a database is a full- text index.
  • every word from every document in the table is indexed so that by looking up the word in the B-Tree, one can quickly find which documents have that a particular word in them.
  • this data is enormous since each entry in the index stores the word, a document identifier, and a position within the document where that word occurs.
  • a typical full- text index might be represented as follows:
  • Each of the entries in this table account for the fact that there might be many documents and some documents may be very long. As such, the fields used to store the document identifier, and the position information must be large enough to indicate the last possible word in the last possible document in the system. Thus, in the example given above, 32 bits would be needed to store the document identifier, and 16 bits would be needed to store the position (though this would limit the documents to 65536 words). As such, a total of (at least) 17 bytes would be needed for each row of data (11 bytes for the string "zoological" and a terminator or length indicator, 4 bytes for the document identifier, and 2 bytes for the position information), for a total of 136 bytes.
  • any values used more than once in the page i.e., the string "zoological" and the document identifiers 5789 and 88764 could be reduced to a single instance, plus one byte (or more) per instance. This would reduce the total size to 17 (first row) + 8 (each instance of "zoological") + 4 (5789) + 4 (each instance of 5789) + 4 (88764) + 3 (each instance of 88764) + 4 (9947852) + 8 * 2 (positions), for a total of 60 bytes.
  • implementations of the present invention provide for storage of all of the page data in the B-Tree database system while using three compression algorithms.
  • "zoological" is only stored once
  • each unique document identifier is only stored once
  • both the document identifiers and the positions of the page data are stored using only the increase from the previous item.
  • the present system uses gamma encoding to store small numbers with fewer bytes.
  • this sample data might require only 11 ("zoological") + 2 (5789 gamma encoded) + 2 (2652 gamma encoded) + 1 ("zoological" and 5789 repeat indicator) + 1 (2752 - 2725 gamma encoded) + 1 ("zoological" and 5789 repeat indicator) + 1 (2731 - 2652 gamma encoded) + 1 ("zoological" and 5789 repeat indicator) + 1 (2788 - 2731 gamma encoded) + 1 ("zoological" repeat indicator) + 3 (88764 - 5789 gamma encoded) + 1 (10 gamma encoded) + 1 ("zoological" and 88476 repeat indicator) + 1 (66 - 10 gamma encoded) + 1 ("zoological" and 88476 repeat indicator) + 1 (82 - 66 gamma encoded) + 4 (9947852 - 88764 gamm
  • FIG. 4 is a flow diagram illustrating the manner in which a compression plug-in is utilized to insert new data into sub-storage according to one embodiment of the present invention.
  • new data is received in step 26.
  • a page in sub-storage having data corresponding with the new data is identified in step 26.
  • the page and corresponding data is then accessed from sub-storage in step 28.
  • the page data is decompressed using inter-row decompression and sent to a page buffer corresponding with the page in step 32.
  • the new data is compressed using intra-row compression in step 34.
  • the new data compressed using intra-row compression is then added to the inter-row decompressed data in the page buffer in step 36.
  • the data in the page buffer is compressed using inter-row compression in step 38. It is then determined whether the compressed data can be stored in the corresponding page in sub-storage in step 40. In the event that it is determined that the compressed data can be stored in the corresponding page in sub-storage, the inter-row compressed data is then stored in the corresponding page in sub-storage in step 42.
  • the compression plug-in is configured to determine, before attempting to write the data from the page buffer to the page in the sub-storage corresponding with the page buffer, whether there is sufficient space in the page in sub-storage to accommodate the data from the page buffer. In the event, that there is sufficient space in the page in sub-storage corresponding with the page buffer, the data is stored in the page in sub-storage. In the event that there is insufficient space in the page in sub-storage, additional space is allocated to store the information in sub- storage before attempting to store the data in sub-storage.
  • Figure 5 illustrates a method utilized to allocate additional space for the storage of the data from the page buffer before attempting to store the data in sub- storage according to one embodiment of the present invention.
  • a request is received to enter data from a page buffer in which additional data has been added into a page in sub-storage in step 44.
  • the data from the page buffer is compressed using inter-row compression in step 46.
  • the amount of space provided by the page in sub-storage is then determined in step 48. Once the amount of space provided by the page in sub-storage is determined, the size of the intra-row and inter-row compressed data is determined in step 50.
  • step 52 It is then identified whether compressed data from the page buffer will fit into the corresponding page in sub-storage in step 52. If there is sufficient space in the page in sub-storage corresponding with the page buffer, data is saved in a page of sub-storage in step 60. If there is insufficient space in the page in sub-storage corresponding with the page buffer, additional buffers and pages in sub-storage are created to accommodate the amount of compressed data in step 54. The compressed data is decompressed using inter-row decompression and then allocated to the page buffers in step 56. Once the data has been allocated to the additional page buffers, the data from each individual page buffer is compressed using inter-row compression and sent to the respective pages in sub-storage such that each page receives inter-row compressed page data from their respective page buffers in step 58.
  • the determination of the sufficiency of space on the page(s) in sub-storage performed by the compression plug-in provides significant performance savings in the data storage system.
  • the attempt to write that data to the page in sub-storage results in significant consumption of system operating time.
  • the data is retrieved from sub-storage, decompressed, split into additional page buffers, and then re-written to storage.
  • the compression plug-in utilizes the row-management system to allocate data into multiple page buffers once it is determined that there is insufficient space in the page(s) in sub-storage to accommodate the data in a particular page buffer.
  • the compressed data is not decompressed when additional page buffers are allocated and the data is inserted into the individual page buffers.
  • the page data is completely decompressed before being allocated to individual page buffers.
  • the size of the pages in the sub-storage are fixed and the compression plug-in determines whether the size of the compressed data is larger than the sized of the fixed sized pages.
  • Figure 6 is a block diagram illustrating the manner in which compression plug-in 16 utilizes additional page buffers to more efficiently transfer data to additional pages in sub-storage.
  • compression plug-in has identified that the size of the Page A 66 in sub-storage was insufficient to accommodate the data originally retrieved from Page A 66 in combination with the new data added to the data retrieved from Page A 66.
  • compression plug- in 16 has allocated an additional page buffer 64 in addition to page buffer 14.
  • An additional page i.e. Page B 66
  • Page B 66 has been allocated which corresponds with page buffer 64.
  • Page B 68 and Page A 66 provide sufficient space for the compressed data which needs to be stored.
  • page buffer 64 Once page buffer 64 has been allocated, the data is allocated to Page A Buffer 14a and Page B Buffer 64 using row manager 62. Utilizing row manager 62 allows for the organized and efficient storage of the data in individual page buffers (i.e. Page A Buffer 14 and Page B Buffer 64). Once the data has been allocated to Page A Buffer 14a and Page B Buffer 64, the data is individually retrieved from each Page of Page A Buffer 14a and Page B Buffer 64, compressed using inter-row compression, and sent for storage to Page A 66 and Page B 68. For example, according to one embodiment of the present invention, subsequent to allocation of the inter-row decompressed data to Page A Buffer 14a and Page B Buffer 64, compression plug-in 16 accesses data from Page A Buffer 14a.
  • Compression plug-in 16 then compresses the data from Page A Buffer 14a utilizing inter-row compression. Once the data from Page. A Buffer 14a is intra-row and inter-row compressed, compression plug-in confirms that there is sufficient space in Page A 66 to store the compressed data. The compressed data from Page A Buffer 14a is then sent to Page A 66 in sub-storage. Compression plug-in 16 then access the data from Page B Buffer 64, compresses the data using inter-row compression, confirms that there is sufficient storage space in Page B 68, and sends the compressed data to Page B in sub-storage.
  • the compression plug-in if the compression plug-in cannot fit the data from the page buffer into the corresponding page in sub- storage, the compression plug-in indicates the condition to the row manager.
  • the row manager system handles the condition by assigning one or more additional page buffers in the sub-storage, updating the relevant information in the row manager system, and then telling the compression plug-in to "split" the data in the page buffer into multiple page buffers.
  • the compression plug-in may try to balance the data relatively equally in each page buffer, as appropriate. Notwithstanding the allocation system used to store data, each page buffer contains the rows of assigned data having intra-row compression applied thereto.
  • more than one additional page buffers and/or pages in sub-storage are allocated based on the size of the compressed data that needs to be stored.
  • only a single additional page buffer and sub-storage page set is initially provided. After splitting the compressed data into the page buffers and recompressing the data from the individual pages, it is then determined whether additional page buffers and pages in sub-storage are needed.
  • the manner in which data is allocated to individual page buffers is tailored to the type of data to be stored.
  • systems in accordance with the present invention can provide benefits to many commercial database systems.
  • one benefit provided by the present invention allows the user of those systems to more specifically identify what type of data is being stored so that the database system could compress the rows more effectively.
  • Another benefit is for allowing the user to directly specify the compression format to use when storing the rows.
  • some frequently used data types can be tailored by the database system itself, and can greatly improve performance and storage requirements for indexes, for example full-text indexes, while retaining their flexibility for storing large amounts of dynamic data.
  • the page buffer may be split only into two page buffers to accommodate extra data, and may also be split more flexibly into additional page buffers, as appropriate.
  • data can be allocated relatively unevenly, into each of the one, two, or three (etc.) additional buffers.
  • the compression plug-in can distribute the items in the specified page buffers into the corresponding specified pages in the proportions specified, such that 15% of the data is allocated to the first page, 70% of the data is allocated in the next page, and 15% of the data is allocated in the last page.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un système de compression de base de données qui comprend une fonctionnalité de compression permettant à une base de données d'être comprimée au moyen de plusieurs algorithmes de compression. De même, des modes de réalisation de cette invention, permettent la compression inter-rangée d'être utilisée avec des dimensions de page fixe dans une base de données par page. Par exemple, la fonctionnalité de compression inter-rangée décomprime une page demandée du sous-ensemble de stockage, et attribue un tampon de page qui correspond au moins à la taille des données de page lorsque l'inter-rangée est décomprimée. La fonctionnalité de compression ajoute ensuite des données dans le tampon de page au moyen de la compression inter-rangée, notamment la compression gamma. Lorsque les données de page ne sont plus nécessaires, la fonctionnalité de compression comprime les données de page au moyen de la compression inter-rangée, et transmet les données de page comprimée du tampon de page vers la page correspondante, qui est de dimension fixe, dans un sous-ensemble de stockage.
PCT/US2005/044275 2004-12-06 2005-12-06 Application d'algorithmes a compression multiple dans un systeme de base de donnees WO2006063057A2 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US63385904P 2004-12-06 2004-12-06
US60/633,859 2004-12-06

Publications (2)

Publication Number Publication Date
WO2006063057A2 true WO2006063057A2 (fr) 2006-06-15
WO2006063057A3 WO2006063057A3 (fr) 2007-04-26

Family

ID=36578522

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2005/044275 WO2006063057A2 (fr) 2004-12-06 2005-12-06 Application d'algorithmes a compression multiple dans un systeme de base de donnees

Country Status (2)

Country Link
US (1) US7769728B2 (fr)
WO (1) WO2006063057A2 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7200603B1 (en) * 2004-01-08 2007-04-03 Network Appliance, Inc. In a data storage server, for each subsets which does not contain compressed data after the compression, a predetermined value is stored in the corresponding entry of the corresponding compression group to indicate that corresponding data is compressed
US7769728B2 (en) * 2004-12-06 2010-08-03 Ivie James R Method and system for intra-row, inter-row compression and decompression of data items in a database using a page-based structure where allocating a page-buffer based on a stored value indicating the page size
US10348897B2 (en) 2017-06-27 2019-07-09 Avaya Inc. System and method for reducing storage space in a contact center

Families Citing this family (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007005829A2 (fr) * 2005-07-01 2007-01-11 Nec Laboratories America, Inc. Compression memoire basee sur un systeme d'exploitation pour systemes integres
US7496589B1 (en) * 2005-07-09 2009-02-24 Google Inc. Highly compressed randomly accessed storage of large tables with arbitrary columns
US7548928B1 (en) 2005-08-05 2009-06-16 Google Inc. Data compression of large scale data stored in sparse tables
US7668846B1 (en) 2005-08-05 2010-02-23 Google Inc. Data reconstruction from shared update log
US8077059B2 (en) * 2006-07-21 2011-12-13 Eric John Davies Database adapter for relational datasets
US9195695B2 (en) * 2006-09-15 2015-11-24 Ibm International Group B.V. Technique for compressing columns of data
US8266147B2 (en) * 2006-09-18 2012-09-11 Infobright, Inc. Methods and systems for database organization
WO2008034213A1 (fr) * 2006-09-18 2008-03-27 Infobright Inc. Procédé et système pour une compression de données dans une base de données relationnelle
US8386444B2 (en) * 2006-12-29 2013-02-26 Teradata Us, Inc. Techniques for selective compression of database information
US7962638B2 (en) * 2007-03-26 2011-06-14 International Business Machines Corporation Data stream filters and plug-ins for storage managers
US20090043792A1 (en) * 2007-08-07 2009-02-12 Eric Lawrence Barsness Partial Compression of a Database Table Based on Historical Information
US8805799B2 (en) * 2007-08-07 2014-08-12 International Business Machines Corporation Dynamic partial uncompression of a database table
US7747585B2 (en) * 2007-08-07 2010-06-29 International Business Machines Corporation Parallel uncompression of a partially compressed database table determines a count of uncompression tasks that satisfies the query
US20090204967A1 (en) * 2008-02-08 2009-08-13 Unisys Corporation Reporting of information pertaining to queuing of requests
US20090282064A1 (en) * 2008-05-07 2009-11-12 Veeramanikandan Raju On the fly compression and storage device, system and method
US20090287986A1 (en) * 2008-05-14 2009-11-19 Ab Initio Software Corporation Managing storage of individually accessible data units
CN102239472B (zh) 2008-09-05 2017-04-12 惠普发展公司,有限责任合伙企业 在支持查询的同时高效地存储日志数据
US8484351B1 (en) 2008-10-08 2013-07-09 Google Inc. Associating application-specific methods with tables used for data storage
US8285691B2 (en) * 2010-03-30 2012-10-09 Ca, Inc. Binary method for locating data rows in a compressed data block
US8521748B2 (en) 2010-06-14 2013-08-27 Infobright Inc. System and method for managing metadata in a relational database
US8417727B2 (en) 2010-06-14 2013-04-09 Infobright Inc. System and method for storing data in a relational database
US8327070B2 (en) * 2010-06-24 2012-12-04 International Business Machines Corporation Method for optimizing sequential data fetches in a computer system
GB2483282B (en) * 2010-09-03 2017-09-13 Advanced Risc Mach Ltd Data compression and decompression using relative and absolute delta values
US8694474B2 (en) * 2011-07-06 2014-04-08 Microsoft Corporation Block entropy encoding for word compression
US8988444B2 (en) * 2011-12-16 2015-03-24 Institute For Information Industry System and method for configuring graphics register data and recording medium
US20130179409A1 (en) * 2012-01-06 2013-07-11 International Business Machines Corporation Separation of data chunks into multiple streams for compression
US8838577B2 (en) * 2012-07-24 2014-09-16 International Business Machines Corporation Accelerated row decompression
US10841405B1 (en) * 2013-03-15 2020-11-17 Teradata Us, Inc. Data compression of table rows
US9069660B2 (en) * 2013-03-15 2015-06-30 Apple Inc. Systems and methods for writing to high-capacity memory
US9569441B2 (en) 2013-10-09 2017-02-14 Sap Se Archival of objects and dynamic search
US9606769B2 (en) * 2014-04-05 2017-03-28 Qualcomm Incorporated System and method for adaptive compression mode selection for buffers in a portable computing device
US9952771B1 (en) * 2016-03-31 2018-04-24 EMC IP Holding Company LLC Method and system for choosing an optimal compression algorithm
US11288257B2 (en) * 2016-05-30 2022-03-29 Sap Se Memory optimization using data aging in full text indexes
US10432484B2 (en) * 2016-06-13 2019-10-01 Silver Peak Systems, Inc. Aggregating select network traffic statistics
CN106980541B (zh) * 2017-03-10 2019-11-19 浙江大学 一种大页内存压缩回收***及方法
US20230325101A1 (en) * 2022-04-12 2023-10-12 Samsung Electronics Co., Ltd. Systems and methods for hybrid storage

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5918225A (en) * 1993-04-16 1999-06-29 Sybase, Inc. SQL-based database system with improved indexing methodology
US6202136B1 (en) * 1994-12-15 2001-03-13 Bmc Software, Inc. Method of creating an internally consistent copy of an actively updated data set without specialized caching hardware

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2987206B2 (ja) * 1991-12-13 1999-12-06 アヴィッド・テクノロジー・インコーポレーテッド バッファ及びフレーム索引作成
US6092070A (en) * 1992-02-11 2000-07-18 Telcordia Technologies, Inc. Method and system for lossless date compression and fast recursive expansion
US5794228A (en) * 1993-04-16 1998-08-11 Sybase, Inc. Database system with buffer manager providing per page native data compression and decompression
US5794229A (en) * 1993-04-16 1998-08-11 Sybase, Inc. Database system with methodology for storing a database table by vertically partitioning all columns of the table
US5668897A (en) * 1994-03-15 1997-09-16 Stolfo; Salvatore J. Method and apparatus for imaging, image processing and data compression merge/purge techniques for document image databases
US5805086A (en) * 1995-10-10 1998-09-08 International Business Machines Corporation Method and system for compressing data that facilitates high-speed data decompression
US5696927A (en) * 1995-12-21 1997-12-09 Advanced Micro Devices, Inc. Memory paging system and method including compressed page mapping hierarchy
US6618728B1 (en) * 1996-01-31 2003-09-09 Electronic Data Systems Corporation Multi-process compression
US6301394B1 (en) * 1998-09-25 2001-10-09 Anzus, Inc. Method and apparatus for compressing data
JP2000305822A (ja) * 1999-04-26 2000-11-02 Denso Corp データベース管理装置,データベースレコード抽出装置,データベース管理方法及びデータベースレコード抽出方法
US6886098B1 (en) * 1999-08-13 2005-04-26 Microsoft Corporation Systems and methods for compression of key sets having multiple keys
US6411295B1 (en) * 1999-11-29 2002-06-25 S3 Graphics Co., Ltd. Apparatus and method for Z-buffer compression
US6523102B1 (en) * 2000-04-14 2003-02-18 Interactive Silicon, Inc. Parallel compression/decompression system and method for implementation of in-memory compressed cache improving storage density and access speed for industry standard memory subsystems and in-line memory modules
US6782136B1 (en) * 2001-04-12 2004-08-24 Kt-Tech, Inc. Method and apparatus for encoding and decoding subband decompositions of signals
US6857045B2 (en) * 2002-01-25 2005-02-15 International Business Machines Corporation Method and system for updating data in a compressed read cache
US6694323B2 (en) * 2002-04-25 2004-02-17 Sybase, Inc. System and methodology for providing compact B-Tree
US7171427B2 (en) * 2002-04-26 2007-01-30 Oracle International Corporation Methods of navigating a cube that is implemented as a relational object
US9195699B2 (en) * 2003-08-08 2015-11-24 Oracle International Corporation Method and apparatus for storage and retrieval of information in compressed cubes
US20060005047A1 (en) * 2004-06-16 2006-01-05 Nec Laboratories America, Inc. Memory encryption architecture
US7769728B2 (en) * 2004-12-06 2010-08-03 Ivie James R Method and system for intra-row, inter-row compression and decompression of data items in a database using a page-based structure where allocating a page-buffer based on a stored value indicating the page size
EP1958072A4 (fr) * 2005-12-08 2012-05-02 Intel Corp Logiciel de compression/decompression d'en-tetes

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5918225A (en) * 1993-04-16 1999-06-29 Sybase, Inc. SQL-based database system with improved indexing methodology
US6202136B1 (en) * 1994-12-15 2001-03-13 Bmc Software, Inc. Method of creating an internally consistent copy of an actively updated data set without specialized caching hardware

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7200603B1 (en) * 2004-01-08 2007-04-03 Network Appliance, Inc. In a data storage server, for each subsets which does not contain compressed data after the compression, a predetermined value is stored in the corresponding entry of the corresponding compression group to indicate that corresponding data is compressed
US7769728B2 (en) * 2004-12-06 2010-08-03 Ivie James R Method and system for intra-row, inter-row compression and decompression of data items in a database using a page-based structure where allocating a page-buffer based on a stored value indicating the page size
US10348897B2 (en) 2017-06-27 2019-07-09 Avaya Inc. System and method for reducing storage space in a contact center

Also Published As

Publication number Publication date
US20060123035A1 (en) 2006-06-08
US7769728B2 (en) 2010-08-03
WO2006063057A3 (fr) 2007-04-26

Similar Documents

Publication Publication Date Title
US7769728B2 (en) Method and system for intra-row, inter-row compression and decompression of data items in a database using a page-based structure where allocating a page-buffer based on a stored value indicating the page size
US6725223B2 (en) Storage format for encoded vector indexes
US7840774B2 (en) Compressibility checking avoidance
US8255398B2 (en) Compression of sorted value indexes using common prefixes
AU2009246432B2 (en) Managing storage of individually accessible data units
US6349372B1 (en) Virtual uncompressed cache for compressed main memory
US8538936B2 (en) System and method for data compression using compression hardware
US11520743B2 (en) Storing compression units in relational tables
US5761536A (en) System and method for reducing memory fragmentation by assigning remainders to share memory blocks on a best fit basis
US7103608B1 (en) Method and mechanism for storing and accessing data
EP1866776B1 (fr) Procede permettant de detecter la presence de sous-blocs dans un systeme de stockage a redondance reduite
Zezula et al. Dynamic partitioning of signature files
US5603022A (en) Data compression system and method representing records as differences between sorted domain ordinals representing field values
US5678043A (en) Data compression and encryption system and method representing records as differences between sorted domain ordinals that represent field values
US6654868B2 (en) Information storage and retrieval system
US5999936A (en) Method and apparatus for compressing and decompressing sequential records in a computer system
EP1265160A2 (fr) Structure de données
JP2001511563A (ja) データベースのための構造
CN101916228A (zh) 带有数据压缩功能的闪存转换层及实现方法
EP1934700A2 (fr) Systeme de gestion de tas de base de donnees a format de page variable, et resolution d'adresses de series d'instructions fixes
US11886401B2 (en) Database key compression
US5815096A (en) Method for compressing sequential data into compression symbols using double-indirect indexing into a dictionary data structure
US6965897B1 (en) Data compression method and apparatus
CN1287316C (zh) 在索引高键码生成期间压缩变长列的方法和***
Zobel et al. Storage Management for Files of Dynamic Records.

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KN KP KR KZ LC LK LR LS LT LU LV LY MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU LV MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 05853242

Country of ref document: EP

Kind code of ref document: A2