CN115630065B

CN115630065B - Storage and query method based on multi-compression mode sub-partition table

Info

Publication number: CN115630065B
Application number: CN202211272183.3A
Authority: CN
Inventors: 周勇亮; 贾宗秀; 赵冬伟; 李晓鹏; 关旭; 蒋旭; 姬涛涛; 刘勇生; 张昕尧
Original assignee: TIANJIN SHENZHOU GENERAL DATA TECHNOLOGY CO LTD
Current assignee: TIANJIN SHENZHOU GENERAL DATA TECHNOLOGY CO LTD
Priority date: 2022-10-18
Filing date: 2022-10-18
Publication date: 2023-08-22
Anticipated expiration: 2042-10-18
Also published as: CN115630065A

Abstract

The invention provides a storage and query method based on a multi-compression mode sub-partition table, which comprises the following steps: step S1, receiving a series of block data which are streamed according to a preset format; s2, analyzing based on a preset format to obtain the data composition in the block data; s3, analyzing different parts of the data composition, and correspondingly compressing the data blocks by adopting different compression modes; step S4, based on the adopted compression mode, matching corresponding partition number segments in a first lookup table, and independently setting index partition type fields in compressed data, wherein the matched index partition number segments are used as additional data; step S5, based on the partition number mark, storing the mark into a corresponding sub-partition table, and recording an index and a compression mode field of corresponding data; step S6, in the data storage process, allocating continuous spaces with different sizes to different sub-partition tables for storage; step S7, the user searches in the corresponding index storage table based on the data compression mode or the data format type.

Description

Storage and query method based on multi-compression mode sub-partition table

Technical Field

The invention relates to the technical field of computer databases, in particular to a storage and query method based on a multi-compression mode sub-partition table.

Background

With the development of large data of the internet, more and more mass data need to be stored, the data come from various places, the data formats are quite different, and for a database, all data are stored according to a given storage process, and although the storage process is quicker, the retrieval process is very slow, and particularly under the condition of extremely large data volume, the retrieval and reading of a magnetic disk are frequent, so that the service life of the magnetic disk is easy to be reduced. In addition, in engineering instrument data debugging and actual measurement, data access to a test instrument is very frequent, a large amount of test data is generated every day, the data are stored in a hard disk in a daily and monthly manner, the data size is very large, and the data cannot be effectively organized and managed due to irregular log-like recorded information, so that inconvenience is brought to future search and inquiry.

Disclosure of Invention

In order to solve the technical problems, the invention provides a storage and query method based on a multi-compression mode sub-partition table, which can adopt a multi-compression mode for different types of data, set different index structures for storage, store the data in different compression modes at different disk partition positions, realize quick search based on the characteristics of the data types during search, and improve the search storage efficiency.

The technical scheme of the invention is as follows: a storage and query method based on a multi-compression mode sub-partition table comprises the following steps:

step S1, receiving a series of block data which are streamed according to a preset format;

s2, analyzing based on a preset format to obtain the data composition in the block data;

s3, analyzing different parts of the data, and correspondingly compressing the data blocks by adopting different compression modes according to preset rules;

step S4, based on the adopted compression mode, matching corresponding partition number segments in a first lookup table, and independently setting index partition type fields in compressed data, taking the matched index partition number segments as additional data, and filling the additional data into the compressed data to obtain compressed data with index partition number marks;

step S5, based on the partition number mark, storing the mark into a corresponding sub-partition table, and recording an index and a compression mode field of corresponding data;

step S6, in the data storage process, allocating continuous spaces with different sizes to different sub-partition tables for storage;

and S7, inputting data to be queried and a data compression mode or a data format type which are judged in advance by a user, and searching in a corresponding index storage table based on the data compression mode or the data format type.

Further, in the step S1, a series of block data streamed according to a predetermined format is received, where the predetermined format refers to:

the simple short control character string format is characterized in that characters are control character strings without data format, and the length of the character strings is smaller than a first threshold value;

the simple complex control character string format is characterized in that characters are control character strings without data format, and the character string length is larger than a first threshold value;

the simple string is connected with the data content format, and comprises a control string format and the data content, wherein the control string format is positioned in front of the data content;

a short data content format comprising only data content and having a length less than a third threshold;

the long data content format includes only data content and has a length greater than a third threshold.

Further, step S2 is to parse based on a predetermined format to obtain a data composition in the block data;

for the simple short control string format, directly extracting the control string;

for a simple and complex control string format, extracting a string, and calculating the length value of the string; extracting part of keywords in the character string;

for a simple string data content format, determining the position and length of the data content based on the control string format, and extracting the data content based on the position and length data;

for short data content format, directly extracting data content;

for long data content formats, the data content is directly extracted, and the data character length is counted.

Further, step S3, analyzing different parts of the data composition, and compressing the data block by using different compression modes according to preset rules; the method specifically comprises the following steps:

for the simple short control string format, after the control string is directly extracted, the control string is directly stored in a first format, namely an original character, and the types of the date and the command format are added in front of the original character;

for the short data content format, directly extracting the data content, and directly storing according to a second format, namely the original numerical value; adding a date and command format type in front of the original character;

storing the simple and complex control string format in a third format, and adding date, command format type, keywords and string length in front of the original character; the keywords are the keywords extracted in the front;

compressing and storing the long data content format in a fourth format, and adding date and command format types in the front;

for the simple word string data content format, the first half part reserves the original data, the second half part is stored in a compressed mode based on the fifth format or according to the original data, and the date and the command format type and the keyword are added in front.

Further, step S4 is to match corresponding partition number segments in the first lookup table based on the adopted compression mode, and to set index partition type fields in the compressed data separately, and to fill the matched index segment number segments as additional data into the compressed data to obtain compressed data with index partition number marks;

wherein, different compression modes correspond to different partition number sections, the first to fifth compression modes correspond to the first to fifth partition number sections, each partition number section is reduced in turn, and reserved gap number sections are reserved among the partition number sections;

each partition number segment is acquired, and the value is added to a preset position of the compressed data of the first format to the fifth format as a partition field.

Further, step S5 is to store the partition number mark in the corresponding sub-partition table, and record the index and compression mode field of the corresponding data;

the disk is partitioned according to the number segments, the width of the number segments is in proportional relation with the space allocated by the disk, the data volume of the current number segments and the space occupation amount of the disk are counted, dynamic adjustment is carried out, and each partition corresponds to one compression mode.

Further, in the step S6, in the data storage process, different continuous spaces with different sizes are allocated to different sub-partition tables for storage.

Further, in step S7, the user inputs the data to be queried and the data compression mode or the data format type determined in advance, and searches in the corresponding index storage table based on the data compression mode or the data format type.

Further, in the query, the user inputs the condition at the input end: the date, the keyword and the command format type are B, the number segment partition corresponding to the format is positioned in the database to be queried, and the date and the entry corresponding to the keyword in the storage table are searched.

Advantageous effects

The invention can analyze and process a large amount of data with different command formats for different devices, adopts different data characteristic extraction and compression modes for data characteristics, distinguishes storage areas of different data with format types for different format data, takes indexes of the data storage areas as fields to be added into the processed compressed data for storage, extracts keyword contents for partial data with command information, facilitates quick query, and particularly adds section range information for each data, thereby facilitating quick query, and can quickly store and query massive data with different formats.

Drawings

Fig. 1: a schematic diagram is saved for data test by connecting a host computer with a plurality of test devices;

fig. 2: a method flow chart of the present invention;

fig. 3: testing a plurality of data format schematics for the device;

fig. 4: the invention correspondingly adopts different compression modes and corresponding storage partition schematics aiming at various data formats.

Detailed Description

The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, but not all embodiments, and all other embodiments obtained by those skilled in the art without the inventive effort based on the embodiments of the present invention are within the scope of protection of the present invention.

As shown in fig. 1, a host 1 is connected to various types of instruments and equipment through a data cable, the host 1 is a desktop computer or a laptop computer, a pad, a mobile terminal, etc., the data cable is a GPIB industrial control bus, or a LAN network cable, etc., the host 1 is simultaneously connected to various types of instruments and equipment 2, for example, a spectrum analyzer, a voltmeter, a frequency counter, a vector network analyzer, etc., the host 1 controls the instruments and equipment 2 to perform a test, and test data is stored in a memory 3 locally or connected thereto. Under the condition of a large number of test samples, a large amount of various test data can be generated, in addition, various command read-write data, log record data and the like can be generated during instrument joint debugging, and the storage mode of the data has great influence on the reading speed of future calling and search query. Because a large amount of various test data are generated every day, if the data are not efficiently organized and processed, the subsequent search and query efficiency is very low, and the time is very delayed.

According to an embodiment of the present invention, a storage and query method based on a multi-compression mode sub-partition table is provided, as shown in fig. 2, including the following steps:

Further, as shown in fig. 3, in step S1, a series of block data streamed according to a predetermined format is received, where the predetermined format refers to:

the simple short control character string format is A1 format, wherein the characters are control character strings without data format, and the character string length is smaller than a first threshold value; for example, CETC3572 of the first row in fig. 3, would belong to such a string format, or for example, IDN? Commands, etc., also belonging to such formats; the string length is typically short, e.g., within 10 characters;

the simple complex control character string format is A2 format, wherein the characters are control character strings or response character strings without data format, and the character string length is larger than a first threshold value; for example, the second row, continue, in FIG. 3; sense: window disc; calicut: form device? Belonging to this class, but the string is slightly longer, typically greater than 10 characters in length;

the simple string is connected with the data content format, namely the B format, and comprises a control string format and the data content, wherein the control string format is positioned in front of the data content; for example, in the third line of fig. 3, "Sense: frequency: start:100000000", which includes a control string format" Sense: frequency: start: "and data content" 100000000";

the short data content format, i.e. the C1 format, comprises only data content and has a length smaller than a third threshold; for example, the fourth line, "300" in FIG. 3, typically the character length of the data is short, e.g., less than 10 characters;

the long data content format, i.e. the C2 format, comprises only data content and has a length greater than the third threshold. For example, the fifth row in fig. 3:

"1.1283433E-8,1.12823E-8,1.2283433E-8,1.34533E-8,1.5289433E-8,1.3383433E-8,1.4283433E-9", the piece of data represents the magnitude of points on a curve, which can be very long, e.g., several thousand bytes, since there are many points on a curve.

for the simple short control string format, directly extracting the control string; for example, with CETC3572, the character string CETC3572 may be directly extracted, and optionally, the character string is also extracted as a keyword and appended to the original character in the subsequent data processing.

For a simple and complex control string format, extracting a string, and calculating the length value of the string; extracting part of keywords in the character string; for example for a context; sense: window disc; calicut: form device? The calculator value length is: 51, the extracted keywords are words in the section marks, for example: continuous, window display, format device, etc., generally extracts the last vocabulary;

for a simple string data content format, the position and length of the data content are determined based on the control string format, and the data content is extracted based on the position and length data. For example, for Sense: frequency: start:100000000, extract string part is Sense: frequency: start, the data part is 100000000, and further extracts the keyword start;

for short data content format, directly extracting data content; for example, for the third line of data, extract 300 directly;

for long data content formats, the data content is directly extracted, the data character length is counted, and grouping is performed according to an array or not.

Further, step S3, analyzing different parts of the data composition, and correspondingly compressing the data block by adopting different compression modes according to a preset rule; the method specifically comprises the following steps:

for the simple short control string format A1, after the control string is directly extracted, the control string is directly stored in a first format, namely an original character, and a date and a command format type are added in front of the original character, wherein the command format type refers to the front A1, A2, B, C1 and C2;

the long data content format is compressed and stored in a fourth format, and the date and command format types are added in the front, and the compression mode can be a mode of predictive coding, transform coding and the like.

Further, step S4 is to match corresponding partition number segments in the first lookup table based on the adopted compression mode (i.e. equivalent to the previous data format), set index partition type fields in the compressed data separately, fill the matched index partition number segments as additional data into the compressed data, and obtain compressed data with index partition number segment marks; for example:

partition segment bits corresponding to B format: 0xHH 01108 011000 … … 0xHH014010

Date 20220910; type B; scale 11000to14010; key is frequency, set; length201, data … … XXXXX XXXX; wherein, scale is added 11000to14010;

wherein, different compression modes correspond to different partition number sections, the first to fifth compression modes correspond to the first to fifth partition number sections, the size of each partition number section can be adjusted according to the size of the data volume, and reserved gap number sections are reserved among the partition number sections; for example, a predetermined gap is reserved between the A1 section and the A2 section to prevent range overflow caused by too fast data growth:

Further, step S5 is to store the partition number mark in the corresponding sub-partition table and record the index and compression mode field of the corresponding data

Further, in the step S6, in the data storage process, different continuous spaces with different sizes are allocated to different sub-partition tables for storage; the sub-partition table in the invention is a traditional disk partition table, is a sub-partition table arranged on a conventional partition table, and correspondingly divides different sub-partitions in order to realize storage based on the compression mode.

For example, if the user needs to query 2021, 9, 5, a certain command related to the query frequency range, the user inputs the condition at the input: 2021.09.05, and keyword frequency, and command format type is B, the number segment partition corresponding to the B format can be rapidly located in the database to perform query, and the date in the storage table is searched, and in all the entries with the date of 2021.09.05, the keyword frequency is queried, so that rapid query is realized, data meeting the condition does not need to be traversed from all the data, and at least 80% of data access quantity can be reduced.

While the foregoing has been described in relation to illustrative embodiments thereof, so as to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but is to be construed as limited to the spirit and scope of the invention as defined and defined by the appended claims, as long as various changes are apparent to those skilled in the art, all within the scope of which the invention is defined by the appended claims.

Claims

1. The storage and query method based on the multi-compression mode sub-partition table is characterized by comprising the following steps:

step S1, receiving a series of block data which are streamed according to a preset format; the step S1 is to receive a series of block data streamed according to a predetermined format, where the predetermined format refers to:

a long data content format including only data content and having a length greater than a third threshold;

s2, analyzing based on a preset format to obtain the data composition in the block data; step S2, analyzing based on a preset format to obtain the data composition in the block data;

for short data content format, directly extracting data content;

for the long data content format, directly extracting the data content, and counting the data character length;

s3, analyzing different parts of the data, and correspondingly compressing the data blocks by adopting different compression modes according to preset rules; s3, analyzing different parts of the data, and correspondingly compressing the data blocks by adopting different compression modes according to preset rules; the method specifically comprises the following steps:

for the simple word string data content format, the first half part reserves the original data, the second half part is stored in a compressed mode based on the fifth format or according to the original data, and the date and command format types and keywords are added in front;

step S5, based on the partition number mark, storing the partition number mark into a corresponding partition table, and recording an index and a compression mode field of corresponding data;

step S6, in the data storage process, allocating continuous spaces with different sizes to different partition tables for storage;

2. The method for storing and inquiring the sub-partition table based on the multiple compression modes according to claim 1, wherein the step S4 is characterized in that, based on the adopted compression mode, the corresponding partition number segment is matched in the first lookup table, the index partition type field is independently set in the compressed data, the matched index partition number segment is used as additional data, and the additional data is filled into the compressed data to obtain the compressed data with the index partition number mark;

and acquiring each partition number segment, and attaching the partition number segment to a preset position of the compressed data of the first format to the fifth format of the result to be used as a partition field.

3. The method for storing and querying sub-partition tables based on multiple compression modes according to claim 1, wherein step S5 is to store the sub-partition table in the corresponding partition table based on the partition number mark and record the index and compression mode field of the corresponding data;

4. A storage and query method based on a multi-compression mode sub-partition table according to claim 3,

in the query, the user inputs the conditions at the input: the date, the keyword and the command format type are B, the number segment partition corresponding to the format is positioned in the database to be queried, and the date and the entry corresponding to the keyword in the storage table are searched.