CN117033397B

CN117033397B - Management method and system for low-memory-occupation query of historical data

Info

Publication number: CN117033397B
Application number: CN202311289508.3A
Authority: CN
Inventors: 张进昆
Original assignee: Beijing Telisino Information Technology Co ltd
Current assignee: Beijing Telisino Information Technology Co ltd
Priority date: 2023-10-08
Filing date: 2023-10-08
Publication date: 2023-12-26
Anticipated expiration: 2043-10-08
Also published as: CN117033397A

Abstract

The invention discloses a management method and a system for low storage occupation query of historical data, wherein the method comprises the following steps: s1, creating a data structure table according to different time sequences of the same field; s2, splitting the data structure table into a dynamic data table and a static data table; s3, when the data field is written, firstly writing a dynamic data table, and marking a data writing time stamp in the dynamic data table; s4, calculating the overrun time of the dynamic data table according to the upper limit of the space storage of the data field of the dynamic data table and the average speed of writing the data field; s5, judging whether the retention time of the stored data field in the dynamic data table exceeds the limit in real time, if yes, entering a step S6, and if no, returning to the step S3; s6, writing the overrun historical data field into the static data table. The invention solves the problem that the existing historical data storage mode reduces the query speed and efficiency of the historical data.

Description

Management method and system for low-memory-occupation query of historical data

Technical Field

The invention belongs to the technical field of data storage management, and particularly relates to a management method and system for low-storage-occupation query of historical data.

Background

With the continuous development of computer technology and the continuous improvement of informatization degree, data volume is rapidly increased, mass data storage and application are also rapidly developed, and big data references are more and more widely used. The historical data of large enterprises, especially financial industries, contains more important information and sensitive information, such as customer data of banking systems, and the historical data cannot be easily deleted based on the business requirements or supervision requirements of the enterprises, so that the historical data generated by the information systems need to be stored. The traditional method generally adopts a structured storage mode, and the full-scale backup of the structured historical data is stored in a specially built database or data table or in an additionally arranged tape library in each storage period. However, with the advent of the large data age, the amount of structured historical data stored in databases will grow rapidly, resulting in greater and greater storage resources and higher storage costs for databases and tape libraries.

In the chinese invention patent No. CN201410363419.3, a method for storing history data is disclosed, which comprises: screening the structured historical data generated by the information system according to a preset screening strategy to obtain structured historical data to be archived, wherein the structured historical data to be archived comprises at least one type of data table; acquiring a data extraction mode of each type of data table in the structured historical data to be archived according to a preset data extraction strategy, wherein the data extraction mode comprises an increment extraction mode and a full extraction mode; if the extraction mode of the corresponding type data table is an increment extraction mode, performing increment extraction on the data in the type data table to obtain increment data in a text file format; if the extraction mode of the corresponding type data table is a full extraction mode, carrying out full extraction on the data in the type data table to obtain full data in a text file format; and saving the increment data and the full quantity data in the text file format to a storage device. The historical data storage space can reduce the consumption of storage resources and the storage cost.

The prior art has the defects that although the historical data storage mode of the prior art realizes that the data of the data table is extracted and compressed into the text format for storage, and reduces the storage space of the historical data, when a user inquires the historical data again, the user also needs to decompress and restore the text format stored data into the original data table and then can inquire the text format stored data, so that the inquiry speed of the historical data is reduced, and the efficiency is poor.

Disclosure of Invention

Aiming at the problem that the existing historical data storage mode reduces the query speed and efficiency of the historical data, the invention provides a management method and a management system for low-storage-occupation query of the historical data.

In order to achieve the technical purpose, the invention adopts the following technical scheme:

a management method for low storage occupation query of historical data comprises the following steps:

s1, creating a data structure table according to different time sequences of the same field;

s2, splitting the data structure table into a dynamic data table and a static data table;

s3, when the data field is written, firstly writing a dynamic data table, and marking a data writing time stamp in the dynamic data table;

s4, calculating the overrun time of the dynamic data table according to the upper limit of the space storage of the data field of the dynamic data table and the average speed of writing the data field;

s5, judging whether the retention time of the storage data field in the dynamic data table exceeds the limit in real time, if yes, entering a step S6, and if not, judging that the retention time exceeds the limit next time;

s6, writing the overrun historical data field into the static data table.

Further, the time sequence of fields includes at least a field ID, a write timestamp, a data type, and a data amount.

Further, the detailed steps of step S2 include:

s201, when splitting an initial data structure table, a user initially presets two storage spaces;

s202, copying the data structure table and then respectively storing the copied data structure table in two storage spaces;

s203, setting one data structure table as a dynamic data table for writing the received data, and setting the other data structure table as a static data table for storing the historical data.

Further, the detailed calculation step of step S4 includes:

s401, calculating and obtaining a residual storage space value of the dynamic data table according to a pre-allocated storage upper limit in the storage space of the dynamic data table and a stored memory difference; the pre-allocated upper limit of storage in the storage space may also replace the data field query and written upper limit storage value in step S501 that follows. And the time required for the remaining storage space of the current dynamic data table to be stored to overrun in the subsequent step S405 is ensured, so that the dynamic data table can be more favorably and normally operated in an optimal state.

S402, rounding the difference between the current writing time and the first writing time of the data field to obtain the total time length of the continuous writing of the data field at the current time;

s403, recording total data quantity written in the dynamic data table by the data field;

s404, comparing the total time length calculated in the step S402 with the total data amount calculated in the step S403 to obtain the average speed of writing the data field;

s405, comparing the value of the remaining storage space of the dynamic data table calculated in the step S401 with the average speed of the written data field to obtain the time required for the remaining storage space of the current dynamic data table to be stored to overrun. The data value of the overrun required time is a dynamic data value, and the average value of the written data at different moments possibly has different insertion frequencies, but in order to maximize and evaluate the time required by overrun of the residual storage space of the current dynamic data table, the overrun time of the dynamic data table can be predicted by the obtained average value of the speed according to the difference from the initial writing to the current writing speed; the following step S6 is convenient for prejudging the data field written in the dynamic data table in advance; preventing the possibility of memory loopholes or loss of memory data field values for overrun data fields.

Further, in step S3, a query method is first used to traverse the data field storage location during writing of the data field, and the detailed traversing query step includes:

s301, firstly, inquiring a data field to be written in a dynamic data table, if the inquiring result has the data field to be written, entering a step S302, and if the inquiring result has no data field to be written, entering a step S305;

s302, identifying the data volume of a data field to be written and the storage space value required by a single data volume;

s303, multiplying the data volume of the data field to be written by the storage space value required by the single data volume, and obtaining the total storage space value required by the current data field to be written;

s304, judging whether the residual storage space value in the dynamic data table is larger than the total storage space value required by the data field to be written, if so, entering a step S4, and if not, expanding the dynamic data table or writing into the blocking opposite column;

s305, inquiring whether a current data field to be written exists in the static data table, if so, entering a step S306, and if not, entering a step S307;

s306, copying the data field of the first row or the first column header of the static data table and mapping the data field into the dynamic data table;

s307, a dynamic data table is newly built according to the time sequence content contained in the data field to be written, and the same static data table data is mapped in the corresponding dynamic data table.

Further, when the remaining storage space value of the data field to be written in the dynamic data table in step S304 is insufficient, the detailed steps of expanding the dynamic data table include:

s3041, automatically traversing the occupied space value and the utilization rate of each stored data point in the data field to be written in the dynamic data table (the reference times are the percentage of the reference times of the whole data field when the data field is queried or queried by other data tables in a period of time);

s3042, reversely grabbing from the stored data points with the lowest utilization rate, removing the dynamic data tables one by one, and copying the dynamic data tables into the static data tables; and releasing the storage space of the dynamic data table, and moving the historical storage data points with low use rate to the static data table for storage.

S3043, after each data point storage inverted grabbing, judging whether the remaining storage space of the dynamic data table is enough to accommodate the total storage space value required by the data field to be written, if so, stopping removing the historical data points, and if not, returning to the step S3042 to continue releasing the storage space of the dynamic data table.

Further, the blocking pair column is used to temporarily store the data field to be written in step S304. And the process of preferentially inquiring and judging whether the data fields to be written have the same data fields in the data structure table is facilitated.

Further, the step S5 of determining in real time whether the retention time of the stored data field in the dynamic data table exceeds the limit includes:

s501, inquiring and writing an optimal upper limit storage space value according to a data field preset by a dynamic data table, and traversing and monitoring storage data points with longer storage time in the dynamic data table in real time; the optimal upper limit storage space value can be set at an optimal upper limit value which does not affect the operation of the dynamic data table, but is not set at the upper limit value of the dynamic data table storage space, so that the optimal state of the dynamic data table storage or writing data fields is ensured.

S502, deleting the data field storage data points with the longest traversed storage time and lower utilization rate by adopting an inverted query mode, and copying the same data field storage data points into a static data table; for static data table storage. The data fields with the longest storage time and higher utilization rate do not need to be transferred to the static data table when the optimal upper limit value of the running of the dynamic data table is not affected; if the optimal upper limit value is exceeded, the data storage points of part of the data fields can be selected to be shifted into the static data table. The inverted query mode is used for arranging inverted whiskers of the storage time and traversing the historical data field storage data points with the longest storage time.

Further, in step S502, the two decision points of storage time and usage rate quantitatively calculate whether the data field storage data points are stored in the static data table in a moving way by adopting a multi-index decision method, and the detailed steps are as follows:

assigning a storage time decision value percentage B1 to the storage time;

the utilization rate is given a utilization rate decision value percentage B2; the percentage of time decision value + the percentage of usage decision value = 1.

Storing time quantized into time data value T _L The method comprises the steps that the longest storage time in a dynamic data table or the longest storage time in a traversal history dynamic data table is preset by a user and used as an endpoint value, 1 is set, and the proportion of the current data field storage time to the whole endpoint value line segment is the data value after the storage time is quantized;

the usage is quantified as usage inverse value Sy (1-usage); i.e. the smaller the usage, the larger the inverse value of usage;

calculated by the formula: t (T) _L * B1+sy b2=shift quantization score; the higher the score, the greater the probability of migrating to the static data table.

The management system comprises a data structure table creation module, a data structure table splitting module, a data field inquiry module, a data field writing module, a dynamic data table storage overrun time calculation module, a data field retention time overrun judging module and a historical data field moving module;

the data structure table creation module creates a data structure table according to different time sequences of the same field;

the data structure table splitting module splits the data structure table into a dynamic data table and a static data table, copies the data structure table and stores the data structure table in two storage spaces respectively;

the data field query module is used for querying the data fields to be written or queried in the dynamic data table and the static data table one by one respectively and judging whether the corresponding data fields are stored or not;

the data field writing module is used for writing the data field to the corresponding data field table after inquiring and confirming the written data field;

the dynamic data table storage overrun time calculation module is used for calculating the overrun time of the dynamic data table according to the upper limit of the space storage of the data field of the dynamic data table storage and the average speed of the written data field;

the data field retention time overrun judging module traverses whether retention time of a stored data field in the dynamic data table is overrun or not in real time;

and the historical data field shifting module is used for writing the historical data field which exceeds the limit into the static data table.

Compared with the prior art, the invention has the following beneficial effects:

by dividing the data structure table into storage data fields, the functions of writing and inquiring the data of the dynamic data table are not influenced by excessive storage of the historical data fields, so that the speed and the efficiency of inquiring the historical data are unchanged under the condition that the storage of the historical data is not influenced, and the writing efficiency of newly written data is not influenced.

Drawings

FIG. 1 is a general flow chart of a method for managing low memory footprint queries for historical data in an embodiment of the invention;

FIG. 2 is a block diagram of a management system for low memory footprint queries for historical data in accordance with an embodiment of the present invention.

The figure indicates: the system comprises a 10-data structure table creation module, a 20-data structure table splitting module, a 30-data field query module, a 40-data field writing module, a 50-dynamic data table storage overrun time calculation module, a 60-data field retention time overrun judging module and a 70-historical data field shifting module.

Detailed Description

The invention will be further described with reference to examples and drawings, to which reference is made, but which are not intended to limit the scope of the invention.

As shown in fig. 1, the present embodiment provides a method for managing low storage occupancy queries of historical data, including the steps of:

s5, judging whether the retention time of the stored data field in the dynamic data table exceeds the limit in real time, if yes, entering a step S6, and if no, returning to the step S3;

s6, writing the overrun historical data field into the static data table.

The time sequence of fields includes at least a field ID, a write timestamp, a data type, and a data amount. The time written in different time by the same field, namely, the time stamp written in, is sequenced, and written in the data structure table correspondingly created. If the business personnel receives the account form, writing the time stamp as year-month-day-hour; the field ID is a unique ID character string of the income account of the salesman; the data type is Decimal of the character string storage type; the amount of data is a multiple of the revenue data. Specific examples are as follows in table 1:

table 1: time sequence table of initial field

The detailed steps of step S2 include:

s203, setting one data structure table as a dynamic data table for writing the received data, and setting the other data structure table as a static data table for storing the historical data. The dynamic data table is used as a dynamic table storage space for rapid addition, deletion and correction, and the static data table is usually used as a data storage space for storing the same word segment without the action of rapid addition, deletion and correction for a long time, so that the operation and cache space of the data table can be conveniently released, the rapid reflection effect of the data can be achieved, and meanwhile, the using effect of the history data stored for a long time is not influenced. Tables 2 and 3 below:

table 2: dynamic data table

Table 3: static data table

The step of writing other fields after the corresponding field ID into the static data table after the follow-up step judges that the writing of the static data table is completed or the dynamic data table is stored out of limit is not performed, and the step of calculating the detail in the step S4 comprises the following steps:

s401, calculating and obtaining a residual storage space value of the dynamic data table according to a pre-allocated storage upper limit in the storage space of the dynamic data table and a stored memory difference; the pre-allocated upper storage limit in the storage space may also replace the data field query and write in the subsequent step S501 with an optimal upper storage space value (typically preset to 80% of the total storage space value). And the time required for the remaining storage space of the current dynamic data table to be stored to overrun in the subsequent step S405 is ensured, so that the dynamic data table can be more favorably and normally operated in an optimal state.

S402, rounding the difference between the current writing time and the first writing time of the data field to obtain the total time length of the continuous writing of the data field at the current time; rounding is to remove finer unit quantity such as millisecond or second unit time, influence subsequent calculation quantity and reduce transportation quantity.

s405, comparing the value of the remaining storage space of the dynamic data table calculated in the step S401 with the average speed of the written data field to obtain the time required for the remaining storage space of the current dynamic data table to be stored to overrun. The overrun required time data value is a dynamic data value, and the insertion frequency of the average value of the written data at different moments is possibly different, but in order to maximize and evaluate the overrun required time of the residual storage space of the current dynamic data table, the overrun time of the dynamic data table can be predicted according to the obtained speed average value; the following step S6 is convenient for prejudging the data field written in the dynamic data table in advance; preventing the possibility of memory loopholes or loss of memory data field values for overrun data fields. Meanwhile, if the set upper limit of storage is 80% of the total storage space value, the possibility of poor storage efficiency of the overrun data field is prevented.

In step S3, when writing the data field, a query method is adopted first, the data field storage location is traversed, and the detailed traversing query steps include:

s303, multiplying the data volume of the data field to be written by the storage space value required by the single data volume, and obtaining the total storage space value required by the current data field to be written; the storage space value given in advance in the data structure table of the same field through the single data volume is limited, and the data volume of the data field to be written is multiplied by the total storage space value required by the maximum section of the current data word to be written; the required storage space values for identifying the single data quantity can be summed one by one according to the data field writing; the required total memory value of the data field is obtained.

s306, copying the data field of the first row or the first column header of the static data table and mapping the data field into the dynamic data table; normally, after all data in the data field in the dynamic data table is transferred into the static data table, it is determined that the data field of the dynamic data table has no data written in for a long time, then the storage space of the dynamic data table is further reduced, and the data field of the first row or first column header of the dynamic data table is deleted; the method is generally applied to updating of system versions, the new database table is used for covering the past database table, when the next version is queried or written with the previous field, the corresponding data field needs to be queried from the static data table to the dynamic data table, and writing of the corresponding time sequence of the data field, namely, writing of specific data values such as field ID, writing time stamp, data type, data quantity and the like is facilitated. As in tables 4 and 5:

table 4 stores the following for the static data table:

table 4: static data table stores data

Table 5 is a mapping result of the dynamic data table after deleting all the field ID information and the field ID from the newly written or queried table field ID "1aa 121":

table 5: mapping results for dynamic data tables

S307, a dynamic data table is newly built according to the time sequence content contained in the data field to be written, and the same static data table data is mapped in the corresponding dynamic data table. Table 6 below:

table 6: static data table data with same mapping of newly built dynamic data table

And the field ID result "1aa121"

In step S304, when the remaining storage space value of the data field to be written in the dynamic data table is insufficient, the detailed steps of expanding the dynamic data table include:

The blocking pair column is used in step S304 to temporarily store the data field to be written. The process of preferentially inquiring and judging whether the data fields to be written have the same data fields in the data structure table is facilitated, so that the aim of caching is fulfilled; and then when the corresponding data field is queried, writing the data of the cache data field in the blocking pair column into the corresponding data field table.

The step S5 of judging whether the retention time of the storage data field in the dynamic data table exceeds the limit in real time comprises the following steps:

In step S502, the two decision points of storage time and utilization rate quantitatively calculate whether the data field storage data points are stored in a static data table in a moving way by adopting a multi-index decision method, and the detailed steps are as follows:

assigning a storage time decision value percentage B1 to the storage time;

the utilization rate is given a utilization rate decision value percentage B2; the percentage of time decision value + the percentage of usage decision value = 1. Each accounting for 50% can be preset, and then the proportion distribution optimization is carried out according to the storage property of the data table; if the data table stores the data of the data fields, and the utilization rate is almost zero or zero, comparing only the storage time is considered, and selecting the data of the data field with the longest storage time to be transferred to the static data table.

the usage rate (the ratio of the number of times of data addition, deletion and investigation to the data fields in the whole data table) is quantized into a usage inverse value Sy (1-usage rate); i.e. the smaller the usage, the larger the inverse value of usage; because the usage rate and the historical storage data storage time are inversely proportional, the usage rate is converted into a usage rate inversely proportional value, so that the usage rate inversely proportional value is convenient to quantify the score value of the consistency representing the required shift of the historical data. The method ensures that the ascending and descending trend of the storage time and the utilization rate is in direct proportion to the ascending and descending trend of the transfer quantization score, and the ascending and descending numerical values are consistent, prevents that any one data value of the storage time and the utilization rate is in inverse proportion to the transfer quantization score data value, so that the transfer quantization score is unstable, and the influence of the storage time and the utilization rate on the size of the transfer quantization score numerical value cannot be judged to be forward or reverse.

Calculated by the formula: t (T) _L * B1+sy b2=shift quantization score; the higher the score, the greater the probability of migrating to the static data table. The higher the shift quantization score is, the higher the shift possibility is, and the data of the stored data field with the highest shift quantization score is removed from the static data table according to the requirement of the dynamic data table; and continuing to finish the subsequent judgment and the migration.

As shown in fig. 2, a management system for low storage occupancy query of historical data includes a data structure table creation module 10, a data structure table splitting module 20, a data field query module 30, a data field writing module 40, a dynamic data table storage overrun time calculation module 50, a data field retention time overrun judgment module 60 and a historical data field storage module 70;

the data structure table creation module 10 creates a data structure table according to different time sequences of the same field;

the data structure table splitting module 20 splits the data structure table into a dynamic data table and a static data table, copies the data structure table and stores the data structure table in two storage spaces respectively;

the data field query module 30 is configured to query the data fields to be written or queried one by one in the dynamic data table and the static data table respectively, and determine whether corresponding data fields are stored;

the data field writing module 40 is configured to write the data field into the corresponding data field table after the data field to be written is queried and confirmed;

the dynamic data table storage overrun time calculating module 50 is configured to calculate the overrun time of the dynamic data table according to the upper limit of the spatial storage of the data field of the dynamic data table storage and the average rate of writing into the data field;

the data field retention time overrun judging module 60 traverses in real time whether the retention time of the stored data field in the dynamic data table is overrun;

the historical data field shift module 70 writes overrun historical data fields into the static data table.

The method and the system for managing the historical data low-memory occupation query provided by the application are described in detail. The description of the specific embodiments is only intended to facilitate an understanding of the method of the present application and its core ideas. It should be noted that it would be obvious to those skilled in the art that various improvements and modifications can be made to the present application without departing from the principles of the present application, and such improvements and modifications fall within the scope of the claims of the present application.

Claims

1. A management method for low-memory occupation query of historical data is characterized by comprising the following steps:

s6, writing the overrun historical data field into a static data table;

the time sequence of fields includes at least a field ID, a write timestamp, a data type, and a data amount;

the detailed steps of step S2 include:

s203, setting a data structure table as a dynamic data table written by the received data, and setting the other data structure table as a static data table stored by the historical data;

the detailed calculation step of step S4 includes:

s401, calculating and obtaining a residual storage space value of the dynamic data table according to a pre-allocated storage upper limit in the storage space of the dynamic data table and a stored memory difference;

s405, comparing the value of the remaining storage space of the dynamic data table calculated in the step S401 with the average speed of the written data field to obtain the time required for the remaining storage space of the current dynamic data table to be stored to overrun.

2. The method for managing low-memory-footprint queries for historical data according to claim 1, wherein in step S3, a query method is first used for writing data fields, and the data field memory locations are traversed, and the detailed traversing query step includes:

3. The method for managing low memory footprint queries of historical data according to claim 2, wherein when the remaining memory space value of the data field to be written in the dynamic data table in step S304 is insufficient, the detailed step of expanding the dynamic data table comprises:

s3041, automatically traversing the occupied space value and the utilization rate of each storage data point in a data field to be written in a dynamic data table;

s3042, reversely grabbing from the stored data points with the lowest utilization rate, removing the dynamic data tables one by one, and copying the dynamic data tables into the static data tables;

4. A method for managing low memory footprint queries for historical data according to claim 3, wherein the step of determining in real time whether the retention time of the stored data field in the dynamic data table exceeds the threshold in step S5 comprises:

s501, inquiring and writing an optimal upper limit storage space value according to a data field preset by a dynamic data table, and traversing and monitoring storage data points with longer storage time in the dynamic data table in real time;

s502, deleting the data field storage data points with the longest traversed storage time and lower utilization rate by adopting an inverted query mode, and copying the same data field storage data points into a static data table; for static data table storage.

5. The method for managing low-memory occupancy query of historical data according to claim 4, wherein in step S502, the two decision points of storage time and utilization rate are quantitatively calculated by using a multi-index decision method to determine whether the data field storage data points are stored in a static data table in a moving manner, and the detailed steps are as follows:

assigning a storage time decision value percentage B1 to the storage time;

the utilization rate is given a utilization rate decision value percentage B2;

storing time quantized into time data value T _L The method comprises the steps that the longest storage time in a dynamic data table or the longest storage time in a traversal history dynamic data table is preset by a user and used as an endpoint value, 1 is set, and the proportion of the storage time of a current data field to the whole endpoint value line segment is the time data value after the storage time is quantized;

the usage is quantified as using an inverse value Sy;

calculating a shift quantization score by a formula, wherein the calculation formula is as follows: t (T) _L * B1+sy b2=shift quantization score.

6. The management system for the low storage occupation query of the historical data is characterized by comprising a data structure table creation module (10), a data structure table splitting module (20), a data field query module (30), a data field writing module (40), a dynamic data table storage overrun time calculation module (50), a data field retention time overrun judging module (60) and a historical data field shifting module (70);

a data structure table creation module (10) that creates a data structure table from different time sequences of the same field;

the data structure table splitting module (20) splits the data structure table into a dynamic data table and a static data table, copies the data structure table and stores the data structure table in two storage spaces respectively;

the data field query module (30) is used for querying the data fields to be written or queried in the dynamic data table and the static data table one by one respectively and judging whether the corresponding data fields are stored or not;

the data field writing module (40) is used for writing the data field into the corresponding data field table after the data field to be written is inquired and confirmed;

the dynamic data table storage overrun time calculation module (50) is used for calculating the overrun time of the dynamic data table according to the upper limit of the space storage of the data field of the dynamic data table storage and the average speed of the written data field;

a data field retention time overrun judging module (60) for traversing whether retention time of stored data fields in the dynamic data table is overrun in real time;

the historical data field shift module (70) writes the overrun historical data field into the static data table;

the data structure table splitting module (20) splits the data structure table into a dynamic data table and a static data table, and the detailed steps comprise:

the dynamic data table storage overrun time calculating module (50) calculates the detail calculating step of the overrun time of the dynamic data table, which comprises the following steps: