CN116894041A - Data storage method, device, computer equipment and medium - Google Patents

Data storage method, device, computer equipment and medium Download PDF

Info

Publication number
CN116894041A
CN116894041A CN202311140141.9A CN202311140141A CN116894041A CN 116894041 A CN116894041 A CN 116894041A CN 202311140141 A CN202311140141 A CN 202311140141A CN 116894041 A CN116894041 A CN 116894041A
Authority
CN
China
Prior art keywords
data
layer
storage
subset
subsets
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311140141.9A
Other languages
Chinese (zh)
Other versions
CN116894041B (en
Inventor
王勇
姚延栋
杨谕黔
杜佳伦
杨飞
翁岩青
高小明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Siweizongheng Data Technology Co ltd
Original Assignee
Beijing Siweizongheng Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Siweizongheng Data Technology Co ltd filed Critical Beijing Siweizongheng Data Technology Co ltd
Priority to CN202311140141.9A priority Critical patent/CN116894041B/en
Publication of CN116894041A publication Critical patent/CN116894041A/en
Application granted granted Critical
Publication of CN116894041B publication Critical patent/CN116894041B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a data storage method, a device, computer equipment and a medium, which relate to the technical field of databases, and the method comprises the steps of receiving data which is stored for the first time and storing the data in a zeroth layer, merging the data to form a plurality of first data subsets when a preset condition is reached, forming hot layer data, and locally storing the hot layer data in a line storage mode; when the data quantity of the first data subset reaches a preset condition, merging the first data subset formed first into a second data subset, transferring to the first layer until the X layer is reached, forming warm layer data, and locally storing in a column storage mode; when the data quantity of the second data subset reaches a preset condition, merging the first formed second data subset to form a third data subset, transferring the third data subset to an X+1th layer, and taking the third data subset as cold layer data to be stored in a row-column mixed mode; the time for entry of hot, warm and cold layers into the system increases or the frequency of access decreases. The scheme improves the data storage efficiency.

Description

Data storage method, device, computer equipment and medium
Technical Field
The present invention relates to the field of database technologies, and in particular, to a data storage method, apparatus, computer device, and medium.
Background
Databases are generally classified into operation type databases and analysis type databases, and such classification methods are known as OLTP and OLAP. The reason for this split design is two reasons, the first being that traffic pattern differences determine differences in processing power: the transaction needs high concurrency and random access, and has high isolation requirement; the concurrency of analysis processing is much lower, but the query operation is complex and the sequential access is much. Another reason is the difference in data lifecycle: the transaction is current data and the analysis is primarily historical data.
The HTAP or the wider super-converged database currently in popularity solves the first problem, namely the technical evolution of the computing and storage engine, so that two different services can be processed in the same database, but the data lifecycle management still has a larger optimization space.
In addition, object storage has high availability, pay-per-use, anytime and anywhere availability, and seamless performance and space expansion capabilities, which makes it an ideal storage for cold storage instead of magnetic disk and tape media.
Because of the long-term history of the database itself, many data supports dumping data onto object stores in a plug-in manner. Taking the most powerful open source database PostgreSQL as an example, it uses S3 storage in several ways:
Using pg_dump and AWS CLI: exporting the database as an SQL file using the pg_dump command and then uploading the SQL file into the S3 bucket using the AWS CLI is a simple and common method.
Using AWS Data Pipeline: AWS Data Pipeline is a hosted service that may be used to automate data transfer and dumping. A data pipeline may be configured to connect the PostgreSQL data source to the S3 target and set appropriate data transfer and dump activities.
Using third party tools and libraries: there are some third party tools and libraries that can help dump PostgreSQL data to S3. For example, the pg_s3 plug-in may export query results or table data directly into the S3 bucket.
The data is exported and dumped, and the data cannot be accessed in a native mode. The timeliness, consistency, error handling and readback efficiency of the data are difficult to guarantee.
Some databases, like Lindorm, are one that evolved from HBase. The HDFS is adopted to manage the space on the object storage, all operations need to be accessed through the file system, the final performance is limited not only by the capability of the object storage, but also by the metadata management overhead of the HDFS, and the performance is affected to a certain extent.
Disclosure of Invention
In view of the above, the embodiment of the invention provides a data storage method to solve the technical problem of low data storage efficiency in the prior art. The method comprises the following steps:
receiving data which are stored for the first time and storing the data in a zeroth layer of a storage space in a line record mode, merging the received data into a first data subset when the received data quantity reaches a first preset condition, continuously receiving the data and forming a plurality of first data subsets, and regarding the data in the zeroth layer as hot layer data;
the hot layer data is stored locally in a line memory mode;
when the data amount of a first data subset in the hot layer data reaches a second preset condition, merging a plurality of first data subsets formed first in the hot layer data to form a second data subset, transferring the second data subset to a first layer of a storage space, continuously receiving data and forming the second data subset until an X layer is formed, wherein each of the first layer to the X layer contains a plurality of second data subsets, and taking the data in the first layer to the X layer as warm layer data, wherein X is a natural number;
The temperature layer data are stored locally in a column storage mode;
when the data amount of the second data subsets in the warm layer data reaches a third preset condition, merging a plurality of first formed second data subsets in the warm layer data to form a third data subset, transferring the third data subset to an X+1th layer of a storage space, and taking the data in the X+1th layer as cold layer data;
storing the cold layer data in a row-column mixed mode;
wherein the time of entering the system of the hot layer data, the warm layer data and the cold layer data is gradually increased or the access frequency is gradually reduced.
The embodiment of the invention also provides a data storage device to solve the technical problem of low data storage efficiency in the prior art. The device comprises:
the hot layer data forming module is used for receiving the data stored for the first time and storing the data in a zeroth layer of a storage space in a line record mode, merging the received data into a first data subset when the received data volume reaches a first preset condition, continuously receiving the data and forming a plurality of first data subsets, and regarding the data in the zeroth layer as hot layer data;
The hot layer data storage module is used for locally storing the hot layer data in a line storage mode;
the warm layer data forming module is used for merging a plurality of first data subsets formed first in the warm layer data to form a second data subset when the data amount of the first data subsets in the warm layer data reaches a second preset condition, transferring the second data subsets to a first layer of a storage space, continuously receiving data and forming the second data subsets until an X layer is formed, wherein each of the first layer to the X layer contains a plurality of second data subsets, and the data in the first layer to the X layer is regarded as warm layer data, and X is a natural number;
the temperature layer data storage module is used for locally storing the temperature layer data in a column storage mode;
the cold layer data forming module is used for merging a plurality of first formed second data subsets in the warm layer data to form a third data subset when the data amount of the second data subsets in the warm layer data reaches a third preset condition, transferring the third data subset to an X+1th layer of a storage space, and regarding the data in the X+1th layer as cold layer data;
The cold layer data storage module is used for storing the cold layer data in an object storage mode in a row-column mixed mode;
wherein the time of entering the system of the hot layer data, the warm layer data and the cold layer data is gradually increased or the access frequency is gradually reduced.
The embodiment of the application also provides computer equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes any data storage method when executing the computer program so as to solve the technical problem of low data storage efficiency in the prior art.
The embodiment of the application also provides a computer readable storage medium which stores a computer program for executing any of the data storage methods, so as to solve the technical problem of low data storage efficiency in the prior art.
Compared with the prior art, the beneficial effects that above-mentioned at least one technical scheme that this description embodiment adopted can reach include at least: receiving data which are stored for the first time and storing the data in a zeroth layer of a storage space in a line record mode, merging the received data into a first data subset when the received data quantity reaches a first preset condition, continuously receiving the data and forming a plurality of first data subsets, and regarding the data in the zeroth layer as hot layer data; the method comprises the steps of locally storing hot layer data in a line memory mode; when the data quantity of a first data subset in the hot layer data reaches a second preset condition, merging a plurality of first data subsets formed first in the hot layer data to form a second data subset, transferring the second data subset to a first layer of a storage space, continuously receiving data and forming the second data subset until an X layer is formed, wherein each of the first layer to the X layer contains a plurality of second data subsets, the data in the first layer to the X layer are regarded as warm layer data, and X is a natural number; the temperature layer data is locally stored in a column storage mode; when the data amount of the second data subsets in the warm layer data reaches a third preset condition, combining a plurality of first formed second data subsets in the warm layer data to form a third data subset, transferring the third data subset to an X+1th layer of a storage space, and taking the data in the X+1th layer as cold layer data; storing the cold layer data in a row-column mixed mode; wherein, the time of entering the system of the hot layer data, the warm layer data and the cold layer data is gradually increased or the access frequency is gradually reduced. According to the application, the data is divided into multiple layers such as a hot layer, a warm layer and a cold layer according to the time and the access frequency of entering the system, each layer adopts different storage modes, format conversion is realized between the layers through merging or dumping of local storage to object storage, the data storage efficiency is improved, and efficient space management and optimized access are realized.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a data storage method provided by an embodiment of the present application;
FIG. 2 is a schematic diagram illustrating storage of data of each layer in the data storage method according to the embodiment of the present application;
FIG. 3 is a schematic diagram of a data layout of an object store according to an embodiment of the present application;
FIG. 4 is a block diagram of a computer device according to an embodiment of the present application;
fig. 5 is a block diagram of a data storage device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application will be described in detail below with reference to the accompanying drawings.
Other advantages and effects of the present application will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present application with reference to specific examples. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. The application may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present application. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In an embodiment of the present invention, a data storage method is provided, as shown in fig. 1, where the method includes:
step S101, receiving data stored for the first time and storing the data in a zeroth layer of a storage space in a form of line record, merging the received data into a first data subset when the received data quantity reaches a first preset condition, continuously receiving the data and forming a plurality of first data subsets, and regarding the data in the zeroth layer as hot layer data;
step S102, the hot layer data is stored locally in a line memory mode;
step S103, when the data amount of a first data subset in the hot layer data reaches a second preset condition, merging a plurality of first data subsets formed first in the hot layer data to form a second data subset, transferring the second data subset to a first layer of a storage space, continuously receiving data and forming the second data subset until an X layer is formed, wherein each of the first layer to the X layer contains a plurality of second data subsets, and taking the data in the first layer to the X layer as warm layer data, wherein X is a natural number;
step S104, the temperature layer data is locally stored in a column storage mode;
Step 105, when the data amount of the second data subset in the warm layer data reaches a third preset condition, merging a plurality of first formed second data subsets in the warm layer data to form a third data subset, transferring the third data subset to an x+1th layer of a storage space, and regarding the data in the x+1th layer as cold layer data;
step S106, storing the cold layer data in a row-column mixed mode;
wherein the time of entering the system of the hot layer data, the warm layer data and the cold layer data is gradually increased or the access frequency is gradually reduced.
It should be noted that, the gradual increase of the time of entering the hot layer data, the warm layer data and the cold layer data refers to time sequence, the time of entering the cold layer data into the system is longest, the time of entering the hot layer data into the system is shortest, that is, the hot layer data is the latest entering the system, and the cold layer data is the earliest entering the system; the gradual increase of the time of entering the system of the hot layer data, the warm layer data and the cold layer data means that according to the access frequency, the cold layer data has the lowest access frequency, the hot layer data has the highest access frequency, that is, the hot layer data has the most active access.
In this embodiment, the hot layer data is stored in rows, similar to the heat table in PostgreSQL, wen Cengshu data may be stored in columns (e.g., in a timing and analysis oriented scenario), and after demotion to the object store, a column and row blend mode may be used. By dividing the data into multiple layers such as a hot layer, a warm layer, a cold layer and the like according to the time and the access frequency of entering the system, each layer adopts different storage modes, and format conversion is realized between the layers through merging or dumping of local storage to object storage, so that the data storage efficiency is improved, and efficient space management and optimized access are realized.
In specific implementation, referring to fig. 2, the specific partitioning process of the hot layer data, the warm layer data and the cold layer data is as follows:
the data is divided into locally ordered subsets of data (called Run), each layer containing a number of Run.
The zeroth layer is hot layer data, which orders a batch of data entering the system for the first time, and stores the data in a form of row level record, when the zeroth layer data reaches a certain quantity or total data quantity (first preset condition), the combination is triggered, and the data are integrally combined into a Run of 1 layer.
The 1 st to X th layers are warm layer data, and X can be changed according to the requirement. The warm layer data is stored in a column storage mode, namely, the data of each column are continuously put together. Additional metadata may be used to distinguish smaller access blocks. When Run of one layer accumulates a certain amount of data (reaching a second preset condition), a part of the data is selected, and the sequence is merged into the next layer until reaching the X layer. As shown in FIG. 2, warm layer data may be placed on the object store in addition to being typically placed locally.
And the X+1 layer and below are cold layer data, the cold layer data are stored in an object storage mode by adopting a row-column mixed storage mode, namely, data of one Run are split into a plurality of object storage, the data are stored in one object in a complete row storage mode, and different objects are stored in a row storage mode. By the arrangement, the whole content of the batch of data can be obtained by reading one object, and the object storage has the advantages of high efficiency and flexibility of data access speed.
In specific implementation, all storage forms use the same set of metadata management mechanism and are managed by adopting similar metadata information, namely, the data storage method further comprises the following steps:
And establishing a metadata management table, and recording metadata of each data subset in each layer and description information for restoring data blocks of each data subset in each layer in the metadata management table, wherein each data subset comprises a plurality of subunits, and one column of data in each subunit is a data block.
Under the unified hierarchical storage structure, the object storage is only one layer. Through the rearrangement of the metadata, the unified management of the metadata is realized, the difference between the metadata stored by the object and the local storage is small, and the row and column mixing/column storage can be also incorporated into a unified metadata management system, so that the dump and access of the object storage become natural and smooth, and the capability of the object storage is fully utilized. And the local storage and the object access are accessed in a consistent mode, so that software maintenance is easier.
In the implementation, for the warm layer data, the metadata can also be directly put on the object storage because of unification, namely the data storage method of the application further comprises the following steps:
and storing the warm layer data in a column storage mode.
Specifically, the metadata related to the present application is mainly directed to a table, and specifically includes the following contents:
(1) Run metadata. It records the most head position of the table, including the status of whether all Run is used, and which Run is on each layer, which files are contained in each Run, etc.
(2) Description information of the data blocks in Run. Each Run is divided into several subunits, each subunit being named range, which is a collection of a set of records of fixed size (the last range may be discontented). The data in the same column in one range is called a stripe, that is, a data block, and the data block is a unit of independent compression and IO. The description information of the data block describes the position, length, number of records, information of empty data, and the like of one data block on the storage medium. The data of the column can be read out and restored according to the description information of the data block. The only difference between the description information records of the data blocks for the object store and the local file store is the difference in the storage locations of the data blocks. For local storage, the description information of the data block is recorded as a file, while for object storage, it is recorded as specific information.
(3) Index information. For each range, the min/max information of the index column is calculated. If the query conditions are provided with the comparison conditions of the related columns, the query conditions can be converted into the judgment of the min/max interval, and if the query conditions are not hit, the query conditions can be skipped, so that the data query efficiency is improved.
The unified metadata management in this scheme supports both the placement of data on local high-speed devices and on independent metadata services. Thus, both privatized deployment, i.e., local database clusters+object storage clusters, and clouded deployment, i.e., metadata clusters+object storage services, are supported.
In a specific implementation, the storing the cold layer data in a row-column mixing mode includes:
when the data quantity of the second data subsets in the warm layer data reaches a third preset condition, determining a plurality of first formed second data subsets in the warm layer data as data subsets needing to be dumped to an object storage;
the following dumping steps are sequentially executed on the plurality of second data subsets:
(1) Establishing a third data subset on the object storage, and setting the initial object ID of the third data subset to be 0;
(2) Reading description information of each data block of the second data subset in the metadata management table;
(3) Reading and writing the data of each data block of the second data subset into a buffer pool (buffer) of the object storage according to the description information (description information of the strip) of each data block;
(4) When the data volume in the buffer pool stored by the object does not reach a third preset condition, directly returning to the current object ID;
(5) The buffer pool continuously receives the data of the second data subset and merges the second data subset until the data amount in the buffer pool reaches a third preset condition, a driver (driver) for object storage is called to write the data in the buffer pool into the third data subset, and the data is filled into a buffer and then an object ID added with 1 is returned;
(6) Storing the description information of each data block of the second data subset and the returned object ID in a metadata buffer pool;
(7) And after all the second data subsets required to be dumped to the object storage are written to the object storage, writing the description information of each data block of the second data subsets in the metadata buffer pool and the returned object ID into the metadata management table.
In a specific embodiment, before step (1), the system starts a periodic description process, checks whether to initiate a demotion operation according to partition conditions and table level policies, and if the demotion conditions are met, generates a demotion task; the destaging task will first find out the Run that needs to be dumped, and the dump step is performed for each Run. After step (7), further comprising: because the data is one-to-one dump, the index information of the data is not changed, and the index information of the search data can be directly modified into the index of the new run.
In one embodiment, after step (5), further comprising: writing the data in the buffer pool to a pre-written log, which is directed to a system supporting master-slave replication.
For systems supporting master-slave replication, data can also be written directly to object storage in duplicate. Master-slave replication systems (such as PostgreSQL) require two separate pieces of data to ensure complete independence from the availability of the other server and its data. This has three advantages: 1) The writing of the log, the log inspection, the log transmission and the log playback all need to occupy larger memory, network and CPU resources, and the execution time is greatly prolonged; 2) Log writing is exclusive, which takes up valuable log resources, affecting other log writing. 3) Write object storage may be erroneous or occasionally slow, which presents difficult error handling for log writing assuming a relatively stable local disk, blocking log playback.
In another embodiment, when the data has not yet arrived at the specified hierarchy, the merge task needs to be triggered first, and the dump task is regenerated.
In particular, when the data satisfies the condition of degradation (third preset condition), the data may be selected to be dumped into the object store. The method is characterized by combining partition technology and degradation management, thereby changing data degradation into a cold data dump problem.
For example, by time-partitioning a range of data is limited to a Zhang Fenou table, and when the latest time of the partition exceeds a threshold (depending on the particular policy), that means that the data on the partition is "cold enough" then that portion of data can be dumped relatively completely to the object store. One typical strategy is to specify the following in the construction of the table: (ttl= 'ts interval+1 m tabs space s 3'). Where TTL is the key value of the option, ts is the time column, and INTERVAL+1m means that the time stamp is more than 1 month from now. For partition tables, the smallest timestamp is calculated. The degraded storage is put into a separate tablespace and the tablespace settings of S3 are described separately below.
Referring to FIG. 3, a data layout of an object store is presented, with a bucket being the bucket, a segno being the number of the MPP database's process segment server, which occurs in pairs (master-slave mode), a dbid being the MPP database's member number, a segno and dbid together uniquely identifying a child node, a database_id being the database's ID, a database_id being the table's ID, a Run being a subset of the table because Run refers to a table, and thus both a database ID and a table ID are to be placed in the name, run being the Run's ID, and obj being the object's ID. In addition, the designated bucket is exclusive to the database cluster.
The data stored based on the data storage method can realize efficient query, and is mainly reflected from the following aspects:
in the index scan aspect, the querying of the operation of the storage engine may be specific to sequential scans and index-based scans of a batch of data. The application supports index scanning, namely, after a data block is found through an index, the range data stored by an object is directly read. For sequential scanning, the data of the whole object can be directly read;
in terms of cache management, the latency of object storage is much higher than that of local storage, but the gap in bandwidth is much smaller. The problem of delay can be relieved by firstly caching the data which are frequently accessed locally, and the cache adopts an LRU mechanism and is eliminated by taking blocks as units. The cache adopts global sharing, namely all tables using S3 as downgrade storage can be used in the embodiment of the application, and dbid+relid+obj+block constitutes a key for obtaining cache data (cache data); it is possible that a single range exceeds one block or that multiple ranges are accessed consecutively.
The cache provides the ability to continue caching, i.e., caching several blocks at a time. The cache space does not guarantee persistence and the database is started to read the data again. In consideration of the fact that after storage calculation separation is realized in the future, the validity of the local cache also needs to be checked again after starting, and the calculation nodes are not started on the same physical machine, so that the lasting requirement is not strong.
Object storage and data prefetching. Whether sequential scanning or index scanning is only accessing a portion of the data in the object, the range of data reads can be deduced by looking at the access pattern (sequence/index) and the number of access columns to reduce the number of access object stores.
Therefore, the data storage method of the present application can have the following effects: 1) Supporting object storage in a unified hierarchical storage system; 2) The dump is highly efficient, and fully utilizes the capability of object storage; 3) The access is convenient, the local storage and the object access are accessed in a consistent mode, and the software maintenance is easy; 4) The query is efficient, and the analysis mode query is supported by the row-column mixed storage, so that a lot of data are queried, and only part of data are conveniently read; 5) The object storage is accessed efficiently, and IO access of the object storage is minimized through a refined prefetching strategy; 6) The system is high in reliability, and the direct writing object is used for storage instead of a log mode, so that the occupation of resources is reduced, and the reliability of the system is greatly improved; 7) The method supports both privatization deployment, namely a local database cluster and an object storage cluster, and cloudization deployment, namely a metadata cluster and an object storage service.
It should be noted that, the method is not only used for object storage degradation, but also can be used for any other form of shared storage; traditional backup storage, such as disk-based backup systems, may also be used when data is synchronized in a journaling manner; the method is also suitable when the data can be subdivided into more layers besides the general hot, warm and cold, for example, the object storage can be layered into extremely cold layers, and can be further copied according to the object storage.
In this embodiment, a computer device is provided, as shown in fig. 4, including a memory 401, a processor 402, and a computer program stored on the memory and executable on the processor, where the processor implements any of the data storage methods described above when executing the computer program.
In particular, the computer device may be a computer terminal, a server or similar computing means.
In the present embodiment, there is provided a computer-readable storage medium storing a computer program that executes any of the above-described data storage methods.
In particular, computer-readable storage media, including both permanent and non-permanent, removable and non-removable media, may be used to implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer-readable storage media include, but are not limited to, phase-change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable storage media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
Based on the same inventive concept, a data storage device is also provided in the embodiments of the present invention, as described in the following embodiments. Since the principle of the data storage device for solving the problem is similar to that of the data storage method, the implementation of the data storage device can refer to the implementation of the data storage method, and the repetition is omitted. As used below, the term "unit" or "module" may be a combination of software and/or hardware that implements the intended function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
FIG. 5 is a block diagram of a data storage device according to an embodiment of the present invention, as shown in FIG. 5, comprising: the structure is described below as a hot layer data forming module 501, a hot layer data storage module 502, a hot layer data forming module 503, a hot layer data storage module 504, a cold layer data forming module 505, and a cold layer data storage module 506.
The hot layer data forming module 501 is configured to receive data stored for the first time and store the data in a zeroth layer of a storage space in a form of a line record, merge the received data into a first data subset when the received data volume reaches a first preset condition, continuously receive the data and form a plurality of first data subsets, and regard the data in the zeroth layer as hot layer data;
A hot layer data storage module 502, configured to store the hot layer data locally in a line memory manner;
a warm layer data forming module 503, configured to combine a plurality of first data subsets formed first in the warm layer data to form a second data subset when the data amount of the first data subsets in the warm layer data reaches a second preset condition, and transfer the second data subset to a first layer of a storage space, continuously receive data and form the second data subset until an xth layer is formed, where each of the first layer to the xth layer contains a plurality of second data subsets, and consider the data in the first layer to the xth layer as warm layer data, and X is a natural number;
a warm layer data storage module 504, configured to store the warm layer data locally in a column storage manner;
the cold layer data forming module 505 is configured to combine a plurality of first second data subsets formed in the warm layer data to form a third data subset when the data amount of the second data subsets in the warm layer data reaches a third preset condition, transfer the third data subset to an x+1st layer of the storage space, and regard the data in the x+1st layer as cold layer data;
A cold layer data storage module 506, configured to store the cold layer data in a row-column mixed manner;
wherein the time of entering the system of the hot layer data, the warm layer data and the cold layer data is gradually increased or the access frequency is gradually reduced.
In a specific implementation, the data storage device further includes:
the metadata management table establishing module is used for establishing a metadata management table, metadata of each data subset in each layer and description information for restoring data blocks of each data subset in each layer are recorded in the metadata management table, wherein each data subset comprises a plurality of subunits, and one column of data in each subunit is a data block.
In particular implementations, the cold layer data storage module 506 is further configured to:
when the data quantity of the second data subsets in the warm layer data reaches a third preset condition, determining a plurality of first formed second data subsets in the warm layer data as data subsets needing to be dumped to an object storage;
the following dumping steps are sequentially executed on the plurality of second data subsets:
establishing a third data subset on the object storage, and setting the initial object ID of the third data subset to be 0;
Reading description information of each data block of the second data subset in the metadata management table;
reading out and writing the data of each data block of the second data subset into a buffer pool of the object storage according to the description information of each data block;
when the data volume in the buffer pool stored by the object does not reach a third preset condition, directly returning to the current object ID;
the buffer pool continuously receives the data of the second data subset and merges the second data subset until the data amount in the buffer pool reaches a third preset condition, the data in the buffer pool is written into the third data subset, and then an object ID added with 1 is returned;
storing the description information of each data block of the second data subset and the returned object ID in a metadata buffer pool;
and after all the second data subsets required to be dumped to the object storage are written to the object storage, writing the description information of each data block of the second data subsets in the metadata buffer pool and the returned object ID into the metadata management table.
In implementation, writing the data in the buffer pool to the third subset of data in the cold layer data storage module 506 includes:
And writing the data in the buffer pool into the third data subsets in a column storage mode, and storing objects among different third data subsets in a line storage mode.
In implementation, after the step of writing the data in the buffer pool into the third data subset and returning the object ID added with 1 in the cold layer data storage module 506, the method further includes:
and writing the data in the buffer pool into a pre-written log.
In particular, the data storage device further comprises:
the computing module is used for computing the maximum value and the minimum value of each subunit, and taking the maximum value and the minimum value as index information of the search data;
and the recording module is used for recording the maximum value and the minimum value corresponding to each subunit in the metadata management table.
In particular, the data storage device further comprises:
and the second temperature layer data storage module is used for storing the temperature layer data in an object in-line storage mode.
The embodiment of the application realizes the following technical effects: receiving data which are stored for the first time and storing the data in a zeroth layer of a storage space in a line record mode, merging the received data into a first data subset when the received data quantity reaches a first preset condition, continuously receiving the data and forming a plurality of first data subsets, and regarding the data in the zeroth layer as hot layer data; the method comprises the steps of locally storing hot layer data in a line memory mode; when the data quantity of a first data subset in the hot layer data reaches a second preset condition, merging a plurality of first data subsets formed first in the hot layer data to form a second data subset, transferring the second data subset to a first layer of a storage space, continuously receiving data and forming the second data subset until an X layer is formed, wherein each of the first layer to the X layer contains a plurality of second data subsets, the data in the first layer to the X layer are regarded as warm layer data, and X is a natural number; the temperature layer data is locally stored in a column storage mode; when the data amount of the second data subsets in the warm layer data reaches a third preset condition, combining a plurality of first formed second data subsets in the warm layer data to form a third data subset, transferring the third data subset to an X+1th layer of a storage space, and taking the data in the X+1th layer as cold layer data; storing the cold layer data in a row-column mixed mode; wherein, the time of entering the system of the hot layer data, the warm layer data and the cold layer data is gradually increased or the access frequency is gradually reduced. According to the application, the data is divided into multiple layers such as a hot layer, a warm layer and a cold layer according to the time and the access frequency of entering the system, each layer adopts different storage modes, format conversion is realized between the layers through merging or dumping of local storage to object storage, the data storage efficiency is improved, and efficient space management and optimized access are realized.
It will be apparent to those skilled in the art that the modules or steps of the embodiments of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a storage device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than what is shown or described, or they may be separately fabricated into individual integrated circuit modules, or a plurality of modules or steps in them may be fabricated into a single integrated circuit module. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, and various modifications and variations can be made to the embodiments of the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method of data storage, comprising:
receiving data which are stored for the first time and storing the data in a zeroth layer of a storage space in a line record mode, merging the received data into a first data subset when the received data quantity reaches a first preset condition, continuously receiving the data and forming a plurality of first data subsets, and regarding the data in the zeroth layer as hot layer data;
the hot layer data is stored locally in a line memory mode;
when the data amount of a first data subset in the hot layer data reaches a second preset condition, merging a plurality of first data subsets formed first in the hot layer data to form a second data subset, transferring the second data subset to a first layer of a storage space, continuously receiving data and forming the second data subset until an X layer is formed, wherein each of the first layer to the X layer contains a plurality of second data subsets, and taking the data in the first layer to the X layer as warm layer data, wherein X is a natural number;
the temperature layer data are stored locally in a column storage mode;
when the data amount of the second data subsets in the warm layer data reaches a third preset condition, merging a plurality of first formed second data subsets in the warm layer data to form a third data subset, transferring the third data subset to an X+1th layer of a storage space, and taking the data in the X+1th layer as cold layer data;
Storing the cold layer data in a row-column mixed mode;
wherein the time of entering the system of the hot layer data, the warm layer data and the cold layer data is gradually increased or the access frequency is gradually reduced.
2. The data storage method of claim 1, wherein the method further comprises:
and establishing a metadata management table, and recording metadata of each data subset in each layer and description information for restoring data blocks of each data subset in each layer in the metadata management table, wherein each data subset comprises a plurality of subunits, and one column of data in each subunit is a data block.
3. The data storage method as claimed in claim 2, wherein storing the cold layer data in a rank mix manner comprises:
when the data quantity of the second data subsets in the warm layer data reaches a third preset condition, determining a plurality of first formed second data subsets in the warm layer data as data subsets needing to be dumped to an object storage;
the following dumping steps are sequentially executed on the plurality of second data subsets:
establishing a third data subset on the object storage, and setting the initial object ID of the third data subset to be 0;
Reading description information of each data block of the second data subset in the metadata management table;
reading out and writing the data of each data block of the second data subset into a buffer pool of the object storage according to the description information of each data block;
when the data volume in the buffer pool stored by the object does not reach a third preset condition, directly returning to the current object ID;
the buffer pool continuously receives the data of the second data subset and merges the second data subset until the data amount in the buffer pool reaches a third preset condition, the data in the buffer pool is written into the third data subset, and then an object ID added with 1 is returned;
storing the description information of each data block of the second data subset and the returned object ID in a metadata buffer pool;
and after all the second data subsets required to be dumped to the object storage are written to the object storage, writing the description information of each data block of the second data subsets in the metadata buffer pool and the returned object ID into the metadata management table.
4. A data storage method as claimed in claim 3, wherein writing data in the buffer pool to the third subset of data comprises:
And writing the data in the buffer pool into the third data subsets in a column storage mode, and storing objects among different third data subsets in a line storage mode.
5. The data storage method of claim 3, further comprising, after said step of writing data in said buffer pool to said third subset of data and returning an object ID of 1, the steps of:
and writing the data in the buffer pool into a pre-written log.
6. The data storage method of claim 2, wherein the method further comprises:
calculating the maximum value and the minimum value of each subunit, and taking the maximum value and the minimum value as index information of search data;
the maximum value and the minimum value corresponding to each of the sub-units are recorded in the metadata management table.
7. The data storage method of claim 1, wherein the method further comprises:
and storing the warm layer data in a column storage mode.
8. A data storage device, comprising:
the hot layer data forming module is used for receiving the data stored for the first time and storing the data in a zeroth layer of a storage space in a line record mode, merging the received data into a first data subset when the received data volume reaches a first preset condition, continuously receiving the data and forming a plurality of first data subsets, and regarding the data in the zeroth layer as hot layer data;
The hot layer data storage module is used for locally storing the hot layer data in a line storage mode;
the warm layer data forming module is used for merging a plurality of first data subsets formed first in the warm layer data to form a second data subset when the data amount of the first data subsets in the warm layer data reaches a second preset condition, transferring the second data subsets to a first layer of a storage space, continuously receiving data and forming the second data subsets until an X layer is formed, wherein each of the first layer to the X layer contains a plurality of second data subsets, and the data in the first layer to the X layer is regarded as warm layer data, and X is a natural number;
the temperature layer data storage module is used for locally storing the temperature layer data in a column storage mode;
the cold layer data forming module is used for merging a plurality of first formed second data subsets in the warm layer data to form a third data subset when the data amount of the second data subsets in the warm layer data reaches a third preset condition, transferring the third data subset to an X+1th layer of a storage space, and regarding the data in the X+1th layer as cold layer data;
The cold layer data storage module is used for storing the cold layer data in an object storage mode in a row-column mixed mode;
wherein the time of entering the system of the hot layer data, the warm layer data and the cold layer data is gradually increased or the access frequency is gradually reduced.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the data storage method of any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program that executes the data storage method of any one of claims 1 to 7.
CN202311140141.9A 2023-09-06 2023-09-06 Data storage method, device, computer equipment and medium Active CN116894041B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311140141.9A CN116894041B (en) 2023-09-06 2023-09-06 Data storage method, device, computer equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311140141.9A CN116894041B (en) 2023-09-06 2023-09-06 Data storage method, device, computer equipment and medium

Publications (2)

Publication Number Publication Date
CN116894041A true CN116894041A (en) 2023-10-17
CN116894041B CN116894041B (en) 2023-11-17

Family

ID=88311064

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311140141.9A Active CN116894041B (en) 2023-09-06 2023-09-06 Data storage method, device, computer equipment and medium

Country Status (1)

Country Link
CN (1) CN116894041B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117725095A (en) * 2024-02-07 2024-03-19 北京四维纵横数据技术有限公司 Data storage and query method, device, equipment and medium for data set

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130013561A1 (en) * 2011-07-08 2013-01-10 Microsoft Corporation Efficient metadata storage
US20140006401A1 (en) * 2012-06-30 2014-01-02 Microsoft Corporation Classification of data in main memory database systems
CN110825748A (en) * 2019-11-05 2020-02-21 北京平凯星辰科技发展有限公司 High-performance and easily-expandable key value storage method utilizing differential index mechanism
CN111984696A (en) * 2020-07-23 2020-11-24 深圳市赢时胜信息技术股份有限公司 Novel database and method
CN115544014A (en) * 2022-10-20 2022-12-30 东北大学 Data merging method, device and equipment in database
CN116166691A (en) * 2023-04-21 2023-05-26 中国科学院合肥物质科学研究院 Data archiving system, method, device and equipment based on data division
CN116312980A (en) * 2023-01-18 2023-06-23 东软医疗***股份有限公司 Data transmission method and device, CT machine and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130013561A1 (en) * 2011-07-08 2013-01-10 Microsoft Corporation Efficient metadata storage
US20140006401A1 (en) * 2012-06-30 2014-01-02 Microsoft Corporation Classification of data in main memory database systems
CN110825748A (en) * 2019-11-05 2020-02-21 北京平凯星辰科技发展有限公司 High-performance and easily-expandable key value storage method utilizing differential index mechanism
CN111984696A (en) * 2020-07-23 2020-11-24 深圳市赢时胜信息技术股份有限公司 Novel database and method
CN115544014A (en) * 2022-10-20 2022-12-30 东北大学 Data merging method, device and equipment in database
CN116312980A (en) * 2023-01-18 2023-06-23 东软医疗***股份有限公司 Data transmission method and device, CT machine and storage medium
CN116166691A (en) * 2023-04-21 2023-05-26 中国科学院合肥物质科学研究院 Data archiving system, method, device and equipment based on data division

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117725095A (en) * 2024-02-07 2024-03-19 北京四维纵横数据技术有限公司 Data storage and query method, device, equipment and medium for data set
CN117725095B (en) * 2024-02-07 2024-05-03 北京四维纵横数据技术有限公司 Data storage and query method, device, equipment and medium for data set

Also Published As

Publication number Publication date
CN116894041B (en) 2023-11-17

Similar Documents

Publication Publication Date Title
CN107943867B (en) High-performance hierarchical storage system supporting heterogeneous storage
CN107168657B (en) Virtual disk hierarchical cache design method based on distributed block storage
CN101556557B (en) Object file organization method based on object storage device
CN107423422B (en) Spatial data distributed storage and search method and system based on grid
CN111309270A (en) Persistent memory key value storage system
US8176233B1 (en) Using non-volatile memory resources to enable a virtual buffer pool for a database application
CN106708427A (en) Storage method suitable for key value pair data
CN108021717B (en) Method for implementing lightweight embedded file system
Carstoiu et al. Hadoop hbase-0.20. 2 performance evaluation
CN113377868B (en) Offline storage system based on distributed KV database
CN105183839A (en) Hadoop-based storage optimizing method for small file hierachical indexing
US20150254320A1 (en) Using colocation hints to facilitate accessing a distributed data storage system
CN107003814A (en) Effective metadata in storage system
CN110825324A (en) Hybrid storage control method and hybrid storage system
CN102117248A (en) Caching system and method for caching data in caching system
CN116894041B (en) Data storage method, device, computer equipment and medium
CN104317736B (en) A kind of distributed file system multi-level buffer implementation method
US20150019598A1 (en) Object file system
CN108108476A (en) The method of work of highly reliable distributed information log system
CN100437524C (en) Cache method and cache system for storing file's data in memory blocks
CN111159176A (en) Method and system for storing and reading mass stream data
CN114817341A (en) Method and device for accessing database
CN115114294A (en) Self-adaption method and device of database storage mode and computer equipment
CN107346209B (en) Multi-disk aggregation type data storage system and implementation method and application method thereof
US7502778B2 (en) Apparatus, system, and method for efficient adaptive parallel data clustering for loading data into a table

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant