CN115454941A - Method and system for realizing saving of storage space of log system - Google Patents

Method and system for realizing saving of storage space of log system Download PDF

Info

Publication number
CN115454941A
CN115454941A CN202211052393.1A CN202211052393A CN115454941A CN 115454941 A CN115454941 A CN 115454941A CN 202211052393 A CN202211052393 A CN 202211052393A CN 115454941 A CN115454941 A CN 115454941A
Authority
CN
China
Prior art keywords
record
value
lsn
user
updating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211052393.1A
Other languages
Chinese (zh)
Inventor
徐奇
付新
王学海
姜久文
张东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dameng Data Technology Jiangsu Co ltd
Original Assignee
Dameng Data Technology Jiangsu Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dameng Data Technology Jiangsu Co ltd filed Critical Dameng Data Technology Jiangsu Co ltd
Priority to CN202211052393.1A priority Critical patent/CN115454941A/en
Publication of CN115454941A publication Critical patent/CN115454941A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1737Details of further file system functions for reducing power consumption or coping with limited storage space, e.g. in mobile devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a system for realizing saving of a storage space of a value log system.

Description

Method and system for realizing saving of storage space of log system
Technical Field
The invention relates to the technical field of databases, in particular to a method and a system for realizing the storage space of a value-saving log system.
Background
The data processing systems such as a database based on a Value log technology and various Key-Value systems convert updating and deleting operations into inserting operations, and all data modification histories are reserved. In practical applications, if there are a lot of update and delete operations, the value log technique will occupy a lot of hard disk space additionally because all historical versions of data are retained, for the following reasons: every record inserted and updated in the Value log system stores the data of all Key-Value of the corresponding Key. When a user executes an updating operation, the position of the data before modification in the value log file can be found through the index, after the data before modification is changed in the memory, the updating operation is converted into an operation of inserting a strip of deletion mark and an operation of inserting a strip of updated data, and the operations are sequentially added and written into the tail part of the value log file. The realization method can fully utilize the characteristic of high performance of sequential reading and writing of the log file, and the writing performance can be greatly improved. Although the above-mentioned implementation of the update operation can keep a high write speed, there is a large waste on the write space, and no matter how much content is updated in each update operation, the content that is not updated in Key-Value is saved again by one copy, and if the update is repeated N times, the field that is not updated is repeatedly saved by N +1 copies in the worst case.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a method and a system for saving storage space of a value log system, which can reduce the occupation of storage medium space and reduce the time-consuming influence on reading operation of a value log file.
In order to solve the above technical problem, the present invention provides a method for implementing a memory space of a value-saving log system, comprising the following steps:
step 1, when a user updates data, the value log system firstly uses the index to find the LSN value of the record to be updated, reads out the corresponding physical record according to the LSN value, and then circularly reads the physical record according to the LSN value of the last operation recorded in the REC until all the column values of the column to be updated are found.
Step 2, at this time, two judgments are made to determine how to write the disc in the current update operation.
And 3, repeating the steps when the user continuously performs the updating operation.
And 4, completing updating, wherein the user needs to read data in the log, and the value log system acquires the latest LSN value of the log record corresponding to the query KEY according to the index.
Step 5, the Value log system reads the latest record of the record according to the LSN Value and circularly reads the record according to the PREV _ LSN in the record until reading all Key-Value data required by the user;
and 6, combining the read values corresponding to all the operations to obtain a value which needs to be accessed by the user, and returning the value to the user.
Preferably, in step 1, the REC indicates that the physical length of the complete record, the information of the number of updates, and the last updated LSN are recorded at the end of the record, and are denoted as PREV _ LSN.
Preferably, in step 2, two determinations are made to determine how to write the disc in the current update operation, one determination is made: adding 1 to the update times recorded in the REC read for the first time in the step 1, judging whether the maximum times of partial update are exceeded or not, and judging two: and accumulating the actual lengths of the partially updated different columns in the physical record read circularly before and the actual length of the column updating operation at this time, if the same column is updated for multiple times, only calculating the current time and the length of the complete physical record recorded in the REC to calculate the proportion, and judging whether the proportion exceeds the maximum proportion of the partially updated length to the length of the whole physical record. When the first judgment or the second judgment is successful, based on cyclic reading of the latest values of all columns, combining modification of the values involved in the updating operation to form a complete insert record, wherein the updating times in the tail part of the physical record are set to be 0, because the complete record is a complete record, the length of the complete physical record is set to be the physical length of the record, PREV _ LSN is set to be LSN of the record before updating, and the LSN is additionally written into the tail part of the value log file. When both the first judgment and the second judgment are unsuccessful, the updating operation only saves the Key of the record to be updated and the updated partial Value (such as the column number and the column Value of the updated column) in the Value log file, and puts the updating time +1 at the tail of the record, the length of the complete physical record is set as the physical length of the last complete record, and the PREV _ LSN is set as the LSN of the physical record before updating.
Preferably, in step 5, at best, the user only needs to use one I/O operation to read the required data, and at worst, the user needs to loop the reading until finding the latest complete record for storing all Key-Value data.
Correspondingly, an implementation system for saving storage space of a value log system comprises: the device comprises a scheduling module, an index module and a value log storage module. The scheduling module receives the updating and inquiring requests of the user, calls the index module to update and inquire, stores the updating and inquiring requests through the value log storage module, and returns the executing result to the user.
Preferably, when a user initiates a request for updating records, the scheduling module receives the update request of the user, first calls the index module to query the LSN value which needs to be updated and recorded latest according to the Key given by the user, then finds a latest record in the log storage module according to the LSN value, determines whether the record needs to be searched forward again through the log storage module by using PREV _ LSN in the record according to whether the record contains all fields which need to be modified, performs two judgments after all records are read, and updates the actually updated record in the memory, stores the actually updated record in the value log storage module, then updates the recorded Key value and the latest LSN value generated by actual storage into the index module, and finally returns the execution result to the user.
Preferably, when a user initiates a query request, the scheduling module receives the query request of the user, first invokes the index module to query a latest LSN value to be queried and recorded according to a Key given by the user, then finds a latest record in the log storage module according to the LSN value, determines whether to forward search through the log storage module again through a PREV _ LSN in the record according to whether the record contains all fields to be queried, and after reading all records, combines the records in a memory and returns the records to the user.
The beneficial effects of the invention are as follows: by improving the update operation of the value-day system, the invention optimizes the original complete record containing modified and unmodified data written in each update operation into a record write-in value log only by forming the modified data, and supposes that a user creates more record fields and frequently modifies a plurality of the fields, so the improvement method of the invention can greatly save the stored data volume; in addition, the invention carries out the limitation of the number and the size of the written incomplete records, the improved method aims to optimize the query speed, and if the limitation of the written incomplete records is not carried out, when a user queries, the user wants to query the unmodified fields, because the complete records only have the oldest inserted records, the user needs to continuously search forward through PREV _ LSN until the oldest inserted records are found, the user can traverse all the modified records, the query time is greatly slowed down, therefore, the limitation of the number and the size of the written incomplete records is carried out, the complete records are updated when the limitation condition is met, and the values of all the fields can be obtained after the latest complete records are found under the worst condition during the query, so the influence on the query time is optimized.
Drawings
FIG. 1 is a schematic diagram of a data writing flow of an implementation method of saving memory space of a value-saving log system according to the present invention.
FIG. 2 is a schematic diagram of a data reading flow of an implementation method of saving memory space of a value-saving data system according to the present invention.
Detailed Description
Fig. 1 is a write data flow diagram of a method and a system for implementing a value log system to save storage space according to embodiment 1 of the present invention. The embodiment 1 can be used for realizing the method for saving the storage space of the value log system and the data writing process of the system. As shown in fig. 1, the method specifically comprises the following steps:
step 1, when a user updates data, the value log system will firstly use the index to find the LSN value of the record to be updated, read out the corresponding physical record according to the LSN value (the physical record indicates with REC that the information of the physical length and the updating times of the complete record and the last updated LSN are recorded at the tail of the record and are marked as PREV _ LSN), and then circularly read the physical record according to the last operated LSN value recorded in REC until all the column values of the column to be updated are found.
And step 2, adding 1 to the updating times recorded in the REC read for the first time in the step 1, and judging whether partial updating maximum times are exceeded or not. If the maximum number of partial and new values is exceeded, based on the loop reading to the latest value of all columns, the operations related to this update are combined to form a complete insert record (the number of updates in the tail of the physical record is set to 0, since this is a complete record, the length of the complete physical record sets the physical length of this record, PREV _ LSN is set to the LSN of the record before updating), additionally written to the tail of the value log file. If the maximum number of partial updates is not exceeded, step 3 is performed.
And 3, accumulating the actual lengths of the partially updated different columns in the physical record read circularly before and the actual length of the column updating operation at this time (if the same column is updated for multiple times, only the current time is calculated), calculating the ratio of the actual lengths of the partially updated different columns and the length of the complete physical record recorded in the REC, and judging whether the maximum ratio of the partially updated length to the length of the whole physical record is exceeded. If the maximum ratio is exceeded, based on the loop reading to the latest value of all columns, the operation related to the value modification of this update is combined to form a complete insert record (the number of updates in the tail of the physical record is set to 0, since this is a complete record, the length of the complete physical record sets the physical length of this record, PREV _ LSN sets the LSN of the record before update), and appended to the tail of the value log file. If the maximum ratio is not exceeded, step 4 is performed.
And 4, only saving the Key of the record to be updated and the updated partial Value (such as the column number and the column Value of the updated column) in the Value log file by the updating operation, placing the updating times +1 at the tail part of the record, setting the length of the complete physical record as the physical length of the last complete record, and setting PREV _ LSN as the LSN of the physical record before updating.
Example 1:
the whole process is explained by taking an update statement in a database as an example, and assuming that a table T1 exists, three columns of ID, NAME and BALANCE exist in the table, wherein the ID is a main key column:
TABLE 1 T1 Table column information
Column names Column type Column meanings
id varchar(20) Account id, primary key
name varchar(20) Account name
balance int Account balance
At this point three new pieces of data have been inserted in the table:
TABLE 2 T1 data information of the Table
id name balance
A001 Zhang San 100
A002 Li Si 200
A003 Wang Wu 300
Execute the UPDATE statement:
UPDATE T1 SET NAME = 'zhao xi' WHERE ID = a003;
and S101, finding the LSN value stored in the actual record of the ID = A003 through the index file, and reading the log record through the LSN value. Because it is a complete data record and satisfies the user's modification fields, the record is no longer looked forward.
S102, assuming that the maximum modification time is set to be 2, the maximum modification proportion is set to be 50%, the current modification time is 1 st time, the current modification proportion is 4/(4 + 4) += 100% =30%, and the two do not meet the judgment condition, the record is partially updated, a Key update position is set to be 1 in a new record in the memory, value is set to be Zhasix, the update time is set to be 1 at the tail of the record, the complete record length is set to be 12, PREV \_LSN is set to be the LSN Value of the previous record, the record is written to a disc, and finally the index is updated to be the LSN Value of the record.
Execute the UPDATE statement:
UPDATE T1 SET BALANCE=’400’WHERE ID=A003;
and S103, finding the LSN value stored in the ID = A003 actual record through the index file, and reading the log record through the LSN value. Because this record does not satisfy all fields modified by the user, the original INSERT log is found by recording the PREV LSN value at the tail, which has satisfied all fields modified by the user, so it is no longer looked forward.
S104, assuming that the maximum modification time is set to be 2, the maximum modification proportion is set to be 50%, the current modification time is 2 nd, and the current cumulative modification proportion is 4+ 4/(4 + 4) = 100% =66%, and 1 of the two meets the judgment condition, based on circularly reading the latest value of all the columns, combining the modification of the related values of the operation of the update to form a complete insertion record, setting the update time in the tail part of the physical record to be 0, setting the complete record length to be 12, setting PREV _LSNto be the LSN of the record before update, writing the record to a disc, and finally updating the index to be the LSN value of the record.
Fig. 2 is a data reading flow chart of a method and a system for implementing a value log system to save storage space according to embodiment 2 of the present invention. The embodiment 2 can be used for realizing the method for saving the storage space of the value log system and the system data reading flow. As shown in fig. 2, the method specifically includes the following steps:
step 1, a user needs to read data in the log, and the value log system obtains the latest LSN value of the log record corresponding to the query KEY according to the index.
And 2, reading the latest record of the record by the Value log system according to the LSN Value, and circularly reading the record according to PREV _ LSN in the record until all Key-Value data required by the user are read. At best, the user can read the required data by only one I/O operation, and at worst, the user needs to circularly read until finding the latest complete record for storing all the Key-Value data to finish.
And 3, combining the read values corresponding to all the operations to obtain a value which needs to be accessed by the user, and returning the value to the user.
Example 2:
example 2 is explained by executing the query statement separately after executing two UPDATE commands in example 1
After the first sentence UPDATE command is executed in embodiment 1, the query statement is executed:
SELECT*FROM T1 WHERE ID=A003;
s201, the value log system obtains the latest LSN value recorded by ID = a003 according to the index.
S202, if the value log system finds that the record can not satisfy all the fields queried by the user, the previous record is read according to the PREV _ LSN value in the record, and because the previous record is a complete record and can satisfy all the fields queried by the user, the record is stopped from being searched forward.
S203, the value log system modifies NAME = 'ZhaoLiu' in the memory based on the complete record, and finally returns the combined record to the user.
After the second sentence UPDATE command is executed in embodiment 1, the query statement is executed:
SELECT*FROM T1 WHERE ID=A003;
s204, the value log system obtains the latest LSN value of the ID = a003 record according to the index.
S205, the value log system finds that the record is a complete record and can meet all fields queried by the user, and directly returns the record to the user.
Correspondingly, an implementation system for saving storage space of a value log system comprises: the device comprises a scheduling module, an index module and a value log storage module. The scheduling module receives the updating and inquiring requests of the user, calls the index module to update and inquire, stores the updating and inquiring requests through the value log storage module, and returns the executing result to the user.
When a user initiates a request for updating records, a scheduling module receives the update request of the user, firstly calls an index module to inquire a latest LSN value needing to be recorded and modified according to a Key given by the user, then finds a latest record in a log storage module according to the LSN value, determines whether the record needs to be searched forwards through the log storage module again according to whether the record contains all fields needing to be modified, performs two judgments after reading all records, updates the actually updated record in a memory, stores the actually updated record through the value log storage module, updates the recorded Key value and the latest LSN value generated by actual storage into the index module, and finally returns an execution result to the user.
When a user initiates a query request, a scheduling module receives the query request of the user, firstly calls an index module to query a latest LSN value which needs to be queried and recorded according to a Key given by the user, then finds a latest record in a log storage module according to the LSN value, determines whether the record needs to be searched forward again through the log storage module by PREV _ LSN in the record according to whether the record contains all fields which need to be queried, and after all records are read, combines the records in a memory and returns the records to the user.
The value log system can greatly reduce the storage redundancy of data when executing the updating operation, and can generate smaller influence on the time consumption of the reading operation when reading the record in the value log file.

Claims (7)

1. A method for realizing the memory space of a saving value log system is characterized by comprising the following steps:
step 1, when a user updates data, a value log system firstly uses an index to find an LSN value of a record to be updated, reads out a corresponding physical record according to the LSN value, and then circularly reads the physical record according to the LSN value of the last operation recorded in the REC until all column values of the column to be updated are found;
step 2, at this time, two judgments are carried out to determine how to write the disc in the updating operation;
step 3, when the user continuously performs updating operation, repeating the steps;
step 4, updating is completed, a user needs to read data in the log, and the value log system acquires the latest LSN value of the log record corresponding to the query KEY according to the index;
step 5, the Value log system reads the latest record of the record according to the LSN Value and circularly reads the record according to the PREV _ LSN in the record until all Key-Value data required by the user are read;
and 6, combining the read values corresponding to all the operations to obtain a value which needs to be accessed by the user, and returning the value to the user.
2. The method for saving storage space of a value log system as claimed in claim 1, wherein in step 1, the physical record indicates with REC that the physical length of the complete record, the information of the number of updates and the last updated LSN, denoted as PREV _ LSN, are recorded at the end of the record.
3. The method as claimed in claim 1, wherein in step 2, two decisions are made to determine how to write the current update operation to the disk, one decision is made: adding 1 to the update times recorded in the REC read for the first time in the step 1, judging whether partial update maximum times are exceeded or not, and judging two: accumulating actual lengths of partially updated different columns in the physical record read circularly before and the actual length of the column updating operation at this time, if the same column is updated for multiple times, only calculating the current time and the length of the complete physical record recorded in the REC to calculate the proportion, and judging whether the proportion exceeds the maximum proportion of the partially updated length to the length of the whole physical record; when the first judgment or the second judgment is successful, based on cyclic reading of the latest values of all columns, combining modification and combination of values involved in the operation of updating this time to form a complete insertion record, setting the updating times in the tail of the physical record to be 0, which is a complete record, setting the length of the complete physical record to be the physical length of the record, setting PREV _ LSN to be the LSN of the record before updating, and additionally writing the LSN to the tail of the value log file; when the first judgment and the second judgment are both unsuccessful, the updating operation only saves the Key of the record to be updated and the updated partial Value in the Value log file, such as the column number and the column Value of the updated column, and puts the updating time +1 at the tail of the record, the length of the complete physical record is set as the physical length of the last complete record, and the PREV _ LSN is set as the LSN of the physical record before updating.
4. The method of claim 1, wherein in step 5, preferably, the user only needs to use one I/O operation to read the required data, and in worst case, the user needs to cycle through reading until finding the latest complete record for storing all the Key-Value data.
5. A system for implementing the method of saving storage space of a value logging system of claim 1, comprising: the device comprises a scheduling module, an index module and a value log storage module; the scheduling module receives the updating and inquiring requests of the user, calls the index module to update and inquire, stores the updating and inquiring requests through the value log storage module, and returns the executing result to the user.
6. The system of claim 5, wherein when a user initiates a request for updating a record, the scheduling module receives the request for updating the record, first invokes the index module to query, according to a Key given by the user, an LSN value whose record needs to be modified and is the latest, then finds, according to the LSN value, the latest record in the log storage module, determines, according to whether the record contains all fields that need to be modified, whether it needs to be searched forward again through the log storage module by a PREV _ LSN in the record, after all records are read, makes two determinations, after the actual updated record is assembled in the memory, stores the actual updated record through the value log storage module, then updates the recorded Key value and the latest LSN value generated by the actual storage into the index module, and finally returns the execution result to the user.
7. The system of claim 5, wherein when the user initiates a query request, the scheduling module receives the query request from the user, first invokes the index module to query, according to the Key given by the user, the LSN value whose record needs to be queried and whose last record is found in the log storage module according to the LSN value, and determines, according to whether the record contains all fields that need to be queried, whether the record needs to be searched forward again through the log storage module via PREV _ LSN in the record, and after all records are read, merges the records in the memory and returns the records to the user.
CN202211052393.1A 2022-08-31 2022-08-31 Method and system for realizing saving of storage space of log system Pending CN115454941A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211052393.1A CN115454941A (en) 2022-08-31 2022-08-31 Method and system for realizing saving of storage space of log system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211052393.1A CN115454941A (en) 2022-08-31 2022-08-31 Method and system for realizing saving of storage space of log system

Publications (1)

Publication Number Publication Date
CN115454941A true CN115454941A (en) 2022-12-09

Family

ID=84300396

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211052393.1A Pending CN115454941A (en) 2022-08-31 2022-08-31 Method and system for realizing saving of storage space of log system

Country Status (1)

Country Link
CN (1) CN115454941A (en)

Similar Documents

Publication Publication Date Title
US9672235B2 (en) Method and system for dynamically partitioning very large database indices on write-once tables
CN101499094B (en) Data compression storing and retrieving method and system
CN111399777A (en) Differentiated key value data storage method based on data value classification
KR102564170B1 (en) Method and device for storing data object, and computer readable storage medium having a computer program using the same
US20190362000A1 (en) Method and apparatus for providing efficient indexing and computer program included in computer readable medium therefor
CN104484471B (en) A kind of implementation method of high-performance data storage engines
WO2005086003A1 (en) Database system
CN114416646A (en) Data processing method and device of hierarchical storage system
KR101806394B1 (en) A data processing method having a structure of the cache index specified to the transaction in a mobile environment dbms
CN113688130B (en) Memory database storage engine management method
CN113253932B (en) Read-write control method and system for distributed storage system
CN112732725B (en) NVM (non volatile memory) hybrid memory-based adaptive prefix tree construction method, system and medium
JP5303213B2 (en) Data management method with data compression processing
CN115576956B (en) Data processing method, system, equipment and storage medium
CN115454941A (en) Method and system for realizing saving of storage space of log system
CN112463837B (en) Relational database data storage query method
CN114996275A (en) Key value storage method based on multi-tree conversion mechanism
CN116382588A (en) LSM-Tree storage engine read amplification problem optimization method based on learning index
CN112783927B (en) Database query method and system
CN113419937A (en) Data and log integrated value log implementation method, device, equipment and storage medium
CN110413617B (en) Method for dynamically adjusting hash table group according to size of data volume
CN114218277A (en) Efficient query method and device for relational database
CN108984720B (en) Data query method and device based on column storage, server and storage medium
CN114741382A (en) Caching method and system for reducing read time delay
CN113742307B (en) Method and system for storing and inquiring secondary index based on value log system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination