CN107092624B - Data storage method, device and system - Google Patents

Data storage method, device and system Download PDF

Info

Publication number
CN107092624B
CN107092624B CN201611237821.2A CN201611237821A CN107092624B CN 107092624 B CN107092624 B CN 107092624B CN 201611237821 A CN201611237821 A CN 201611237821A CN 107092624 B CN107092624 B CN 107092624B
Authority
CN
China
Prior art keywords
data
storage
storage mode
supporting
data record
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611237821.2A
Other languages
Chinese (zh)
Other versions
CN107092624A (en
Inventor
曾春
罗哲
杜洪先
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xingxuan Technology Co Ltd
Original Assignee
Beijing Xingxuan Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xingxuan Technology Co Ltd filed Critical Beijing Xingxuan Technology Co Ltd
Priority to CN201611237821.2A priority Critical patent/CN107092624B/en
Publication of CN107092624A publication Critical patent/CN107092624A/en
Application granted granted Critical
Publication of CN107092624B publication Critical patent/CN107092624B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a data storage method, device and system. The data storage method comprises the following steps: acquiring a data table to be processed from a storage node supporting a line type storage mode, wherein the data table to be processed comprises at least one data record stored in the line type storage mode; selecting a data record to be transferred from at least one data record; and transferring the data records to be transferred to at least one storage node supporting the column type storage mode in the column type storage mode. According to the embodiment of the application, the combination of the line type storage mode and the column type storage mode can be realized, on one hand, the advantage that the line type storage supports high concurrent query is fully utilized, the query efficiency of data which are not transferred is ensured, on the other hand, the complex query of the transfer data is facilitated based on the column type storage, and the query efficiency of the transfer data is ensured.

Description

Data storage method, device and system
Technical Field
The present application relates to the field of internet technologies, and in particular, to a data storage method, apparatus, and system.
Background
Relational databases are stored in units of records and are typically used to store transactional data. For a business party using the database, it is desirable to maintain a data table, which does not affect business logic, but as time goes on and data volume increases, the data table becomes larger and larger, which results in slower and slower query speed and gradual decline of the overall performance of the database.
The prior art adopts a table partitioning or database-partitioning mode to solve the problems. A common database and table dividing method is as follows: the data table is divided into a plurality of data tables by taking time or a primary key as the basis of the sub-table, and each data table stores partial data. Thus, the data query efficiency of the hot data table can be ensured.
Disclosure of Invention
Through analyzing the appeal discovery of a large number of business parties: most business parties often use new data, and the historical data are used less, so that a large amount of historical data are stored in a large cold data table, and the new data are stored in a small hot data table, so that the data query efficiency of the hot data table can be ensured, and the query pressure of the business parties is relieved.
However, the business side uses the history data less, and does not use the history data at all. When the business party needs to use the historical data, the data table where the historical data is located is still large, so that the query efficiency of the historical data is low, and the query requirement of the business party on the historical data cannot be met.
In view of the above technical problems, the inventor of the present application easily thinks of the following solutions: and continuously dividing the table, and storing the historical data into a plurality of data tables to ensure that each data table is not too large, thereby ensuring the query efficiency of the historical data.
Aiming at the continuous table dividing mode, the inventor of the application finds out through further analysis that: if the business party simply inquires the historical data in a certain data table, the inquiry efficiency of the historical data can be ensured by continuing the table dividing mode. If a business party needs to comprehensively query a plurality of sub-tables, cross-table query is involved, query complexity is increased by continuing the sub-table mode, the overall query efficiency is not always ensured, and meanwhile, the maintenance problem of the plurality of sub-tables is also faced.
Based on the above analysis, the inventor of the present application departs from the idea of the prior art scheme and provides a new solution after creative work, and the main principle is as follows: the line type storage mode and the column type storage mode are combined, new data are stored in line type, the advantage that the line type storage supports high concurrent query is fully utilized, and the query efficiency of the new data is guaranteed; historical data are stored in a column mode, complex query is conveniently carried out on a large amount of historical data based on the column mode storage, the query efficiency of the historical data is guaranteed, and meanwhile the problem of maintenance of a plurality of sub-tables can be solved to a certain extent.
In order to achieve the above object, an embodiment of the present application provides a data storage method, including:
acquiring a data table to be processed from a storage node supporting a line type storage mode, wherein the data table to be processed comprises at least one data record stored in the line type storage mode;
selecting a data record to be transferred from the at least one data record;
and transferring the data record to be transferred to at least one storage node supporting the column type storage mode in a column type storage mode.
In an optional embodiment, the step of selecting the data record to be dumped comprises at least one of the following modes:
selecting a data record with the storage time meeting a preset time condition from the at least one data record as the data record to be transferred;
selecting a data record with a main key meeting a preset main key condition from the at least one data record as the to-be-transferred data record;
and selecting the data record with the access frequency lower than the frequency threshold value from the at least one data record as the data record to be transferred.
In an optional embodiment, the unloading step of the data record to be unloaded includes:
dividing the data record to be transferred into at least one data record segment based on a horizontal partition strategy; the at least one data recording segment corresponds to the at least one storage node supporting the columnar storage mode one by one;
and respectively transferring the at least one data recording segment to a corresponding storage node supporting the column storage mode in the column storage mode.
In an optional embodiment, the unloading step of the at least one data recording segment includes:
exporting the at least one data recording segment from the to-be-processed data table to at least one file;
respectively importing the data recording segments in the at least one file into corresponding storage nodes supporting a column type storage mode;
and in the at least one storage node supporting the columnar storage mode, respectively storing the corresponding data record segments in a column manner.
In an optional embodiment, during the process of unloading the data record to be unloaded, the method further includes:
recording change operation aiming at the data record to be transferred and stored in the data table to be processed, and determining a storage node supporting a column type storage mode corresponding to the change operation;
after the data record to be transferred is successfully transferred, the method further comprises:
adding a reading lock to the data table to be processed;
playing back the change operation in a storage node which supports a columnar storage mode and corresponds to the change operation;
and releasing the read lock of the data table to be processed.
In an optional embodiment, after the data record to be transferred is successfully transferred, the method further includes:
deleting the data record to be transferred from the data table to be processed;
and setting the at least one storage node supporting the columnar storage mode and the data range of the data table to be processed.
In an optional embodiment, the storage node supporting the line storage comprises: a row-wise storage node and/or a hybrid storage node; the at least one storage node supporting the columnar storage mode comprises: columnar storage nodes and/or hybrid storage nodes.
The embodiment of the present application further provides a data query method, including:
determining that the data to be queried are simultaneously distributed on storage nodes supporting a row type storage mode and storage nodes supporting a column type storage mode according to a data range to which the data to be queried belong;
distributing a query request to the storage nodes supporting the row-type storage mode and the storage nodes supporting the column-type storage mode for parallel query;
merging the query results returned by the storage nodes supporting the line storage mode and the storage nodes supporting the column storage mode;
and outputting the combined query result.
In an optional embodiment, the number of storage nodes supporting the columnar storage mode is multiple.
In an optional embodiment, the method further comprises:
determining that the data to be queried is distributed in storage nodes supporting a line type storage mode or storage nodes supporting a column type storage mode according to a data range to which the data to be queried belongs;
distributing a query request to the storage nodes supporting the row-type storage mode or the storage nodes supporting the column-type storage mode for query;
and outputting the query result returned by the storage nodes supporting the line storage mode or the storage nodes supporting the column storage mode.
An embodiment of the present application further provides a data storage device, including:
the data processing method comprises the steps of obtaining a data table to be processed from storage nodes supporting a line type storage mode, wherein the data table to be processed comprises at least one data record stored in the line type storage mode;
the selection unit is used for selecting the data record to be transferred from the at least one data record;
and the unloading unit is used for unloading the data record to be unloaded to at least one storage node supporting the column type storage mode in a column type storage mode.
In an optional embodiment, the selecting unit is specifically configured to perform at least one of the following operations:
selecting a data record with the storage time meeting a preset time condition from the at least one data record as the data record to be transferred;
selecting a data record with a main key meeting a preset main key condition from the at least one data record as the data record to be transferred;
and selecting the data record with the access frequency lower than the frequency threshold value from the at least one data record as the data record to be transferred.
In an optional embodiment, the unloading unit is specifically configured to:
dividing the data record to be transferred into at least one data record segment based on a horizontal partition strategy; the at least one data recording segment corresponds to the at least one storage node supporting the columnar storage mode one by one;
and respectively transferring the at least one data recording segment to a corresponding storage node supporting the columnar storage mode in the columnar storage mode.
In an optional embodiment, the unloading unit is specifically configured to:
exporting the at least one data recording segment from the to-be-processed data table to at least one file;
respectively importing the data recording segments in the at least one file into corresponding storage nodes supporting a columnar storage mode;
and in the at least one storage node supporting the columnar storage mode, respectively storing the corresponding data record segments in a column manner.
In an alternative embodiment, the apparatus further comprises:
the recording unit is used for recording the change operation aiming at the data record to be transferred in the data table to be processed in the process of transferring the data record to be transferred, and determining a storage node which supports a column type storage mode and corresponds to the change operation;
and the playback unit is used for adding a read lock to the to-be-processed data table after the to-be-transferred data record is successfully transferred, playing back the change operation in a storage node corresponding to the change operation and supporting a column type storage mode, and releasing the read lock of the to-be-processed data table.
In an alternative embodiment, the apparatus further comprises:
the deleting unit is used for deleting the data record to be transferred from the data table to be processed;
and the setting unit is used for setting the at least one storage node supporting the columnar storage mode and the data range of the data table to be processed.
In an optional embodiment, the storage node supporting the line storage comprises: a row-wise storage node and/or a hybrid storage node; the at least one storage node supporting the columnar storage mode comprises: columnar storage nodes and/or hybrid storage nodes.
An embodiment of the present application further provides a data query device, including:
the determining unit is used for determining that the data to be inquired are simultaneously distributed on the storage nodes supporting the row type storage mode and the storage nodes supporting the column type storage mode according to the data range to which the data to be inquired belong;
a sending unit, configured to distribute a query request to the storage nodes supporting the line storage manner and the storage nodes supporting the line storage manner to perform parallel query;
the merging unit is used for merging the query results returned by the storage nodes supporting the line storage mode and the storage nodes supporting the column storage mode;
and the output unit is used for outputting the combined query result.
In an optional embodiment, the determining unit is further configured to: determining that the data to be queried is distributed in storage nodes supporting a line type storage mode or storage nodes supporting a column type storage mode according to a data range to which the data to be queried belongs;
the sending unit is further configured to: distributing a query request to the storage nodes supporting the line storage mode or the storage nodes supporting the column storage mode for query;
the output unit is further configured to: and outputting the query result returned by the storage nodes supporting the line storage mode or the storage nodes supporting the column storage mode.
An embodiment of the present application further provides a distributed storage system, including: at least one storage node supporting a line storage mode, at least one storage node supporting a column storage mode and an access control device;
the at least one storage node supporting a line type storage mode is used for storing data in the line type storage mode;
the at least one storage node supporting the columnar storage mode is used for storing data in the columnar storage mode;
the access control device is used for acquiring a data table to be processed from the at least one storage node supporting the line type storage mode, wherein the data table to be processed comprises at least one data record stored in the line type storage mode; selecting a data record to be transferred from the at least one data record; and transferring the data records to be transferred to at least one storage node in the at least one storage node supporting the column storage mode in a column storage mode.
In an optional embodiment, the access control device is further configured to: determining that the data to be queried are simultaneously distributed on storage nodes supporting a row type storage mode and storage nodes supporting a column type storage mode according to a data range to which the data to be queried belong; distributing a query request to the storage nodes supporting the line storage mode and the storage nodes supporting the column storage mode to perform parallel query; merging the query results returned by the storage nodes supporting the line storage mode and the storage nodes supporting the column storage mode; and outputting the combined query result.
In an optional embodiment, the at least one storage node supporting the line storage manner is a line storage node; the at least one storage node supporting the columnar storage mode is a columnar storage node; or
The at least one storage node supporting the row-type storage mode and the at least one storage node supporting the column-type storage mode are both hybrid storage nodes.
In one possible design, the data storage device may include a processor and a memory, the memory is used for storing a program for supporting the data storage device to execute the data storage method provided by the above embodiment, and the processor is configured to execute the program stored in the memory.
Optionally, the data storage device may further comprise a communication interface for the data storage device to communicate with other devices or a communication network.
The embodiment of the present application further provides a computer storage medium, configured to store computer software instructions for the data storage device, where the computer storage medium includes a program for executing the data storage method provided in the foregoing embodiment to the data storage device provided in the foregoing embodiment.
In a possible design, the data query apparatus may include a processor and a memory, the memory is used for storing a program that supports the data query apparatus to execute the data query method provided by the foregoing embodiment, and the processor is configured to execute the program stored in the memory.
Optionally, the data query apparatus may further include a communication interface, which is used for the data query apparatus to communicate with other devices or a communication network.
The embodiment of the present application further provides a computer storage medium, configured to store computer software instructions used by the data query apparatus, where the computer storage medium contains a program for executing the data query method provided in the foregoing embodiment to the data query apparatus provided in the foregoing embodiment.
In the embodiment of the application, a distributed storage system is formed based on storage nodes supporting a line type storage mode and storage nodes supporting a column type storage mode, partial data records are selected from the storage nodes supporting the line type storage mode, and the partial data records are transferred to at least one storage node supporting the column type storage mode in the column type storage mode, so that the combination of the line type storage mode and the column type storage mode is realized.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1a is a schematic structural diagram of a distributed storage system according to an embodiment of the present application;
fig. 1b is a schematic flowchart of a data storage method according to an embodiment of the present application;
FIG. 1c is a schematic flow chart illustrating a data storage method according to another embodiment of the present application;
FIG. 1d is a schematic flow chart of step 14 in FIG. 1c according to another embodiment of the present application;
FIG. 2a is a schematic structural diagram of a data storage system according to another embodiment of the present application;
FIG. 2b is a schematic flow chart illustrating a data storage method according to another embodiment of the present application;
FIG. 3a is a schematic structural diagram of a data storage system according to yet another embodiment of the present application;
FIG. 3b is a schematic flow chart illustrating a data storage method according to another embodiment of the present application;
fig. 4a is a schematic flowchart of a data query method according to another embodiment of the present application;
FIG. 4b is a schematic structural diagram of a data storage device according to yet another embodiment of the present application;
FIG. 5 is a schematic structural diagram of a data storage device according to yet another embodiment of the present application;
fig. 6 is a schematic structural diagram of a data query device according to yet another embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only a few embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Fig. 1a is a schematic structural diagram of a distributed storage system according to an embodiment of the present application. As shown in fig. 1a, the distributed storage system includes: the storage system comprises at least one storage node supporting a line type storage mode, at least one storage node supporting a column type storage mode and an access control device.
Wherein the at least one storage node supports a line type storage mode and is used for storing data in the line type storage mode. And the storage node supports a columnar storage mode and is used for storing data in the columnar storage mode. The access control device is used for performing access control on data in the storage nodes supporting the line storage mode and the storage nodes supporting the column storage mode. The access control comprises unloading, storing, reading, inserting, modifying, deleting and the like.
The access control device is used for acquiring a data table to be processed from at least one storage node supporting a line type storage mode when the data records are unloaded, wherein the data table to be processed comprises at least one data record stored in the line type storage mode; selecting a data record to be transferred from at least one data record; and transferring the data records to be transferred to at least one storage node in at least one storage node supporting the column storage mode in a column storage mode.
Optionally, the data table to be processed may be stored on a certain storage node supporting a line-type storage manner. The access control device may acquire the to-be-processed data table from one of the at least one storage node supporting the line storage scheme. Or alternatively
Alternatively, the data table to be processed may be distributively stored on a plurality of storage nodes supporting the line storage manner. The access control device may acquire the to-be-processed data table from a plurality of storage nodes among at least one storage node supporting the line storage scheme.
When the access control device queries data, the access control device may determine that the data to be queried is simultaneously distributed in storage nodes supporting a line type storage mode and storage nodes supporting a column type storage mode according to a data range to which the data to be queried belongs; distributing the query request to a storage node supporting a line type storage mode and a storage node supporting a column type storage mode to perform parallel query; merging the query results returned by the storage nodes supporting the line type storage mode and the storage nodes supporting the column type storage mode; and outputting the combined query result.
Furthermore, when the access control device queries data, it may also determine, according to a data range to which the data to be queried belongs, that the data to be queried is distributed in storage nodes supporting a line-type storage manner or storage nodes supporting a column-type storage manner; distributing the query request to a storage node supporting a line type storage mode or a storage node supporting a column type storage mode for query; and outputting the query result returned by the storage nodes supporting the line storage mode or the storage nodes supporting the column storage mode.
The flow of unloading data records and querying data is described in detail below with reference to specific method embodiments.
Fig. 1b is a schematic flowchart of a data storage method according to an embodiment of the present application. The method is applicable to a distributed storage system comprising storage nodes supporting a row-wise storage mode and storage nodes supporting a column-wise storage mode. As shown in fig. 1b, the method comprises:
101. and acquiring a data table to be processed from the storage nodes supporting the line type storage mode, wherein the data table to be processed comprises at least one data record stored in the line type storage mode.
102. And selecting the data record to be transferred from the at least one data record.
103. And transferring the data records to be transferred to at least one storage node supporting the column storage mode in a column storage mode.
In this embodiment, the to-be-processed data table may be any data table in the storage nodes supporting the line storage manner, and the to-be-processed data table contains at least one data record stored in the line storage manner. In other words, the data record is one line of data stored in the data table in units of records.
Alternatively, the storage nodes supporting the line storage method may be storage nodes supporting only the line storage method, which are simply referred to as line storage nodes. Alternatively, the storage nodes supporting the line storage scheme may be storage nodes supporting both the line storage scheme and the column storage scheme, which are simply referred to as hybrid storage nodes. Accordingly, the data table to be processed may be a line type data table supporting only the line type storage manner, or a mixed type data table supporting both the line type storage manner and the column type storage manner.
It should be noted that the number of the storage nodes supporting the line storage manner may be one or more. For example, if the to-be-processed data table is stored on a certain storage node supporting a line storage manner, the to-be-processed data table may be acquired from the storage node supporting the line storage manner. For another example, if the to-be-processed data table is distributed and stored on a plurality of storage nodes supporting the line storage method, the to-be-processed data table may be acquired from the plurality of storage nodes supporting the line storage method.
With the time lapse and the increase of the data quantity, more and more data records are available in the data table to be processed, and the data table to be processed is larger and larger, so that the query speed is slower and slower. To solve the problem, the embodiment acquires the data table to be processed from the storage node supporting the line storage manner, for example, may acquire a data table with relatively many data records as the data table to be processed; selecting all or part of data records from at least one data record contained in the data table to be processed, and calling the data records to be transferred and stored; and transferring the data records to be transferred to at least one storage node supporting the column type storage mode in the column type storage mode.
After the data is transferred and stored, the number of data records in the data table to be processed is reduced, and the advantage of high concurrent query is supported by using a line type storage mode, so that the query efficiency of the data records which are not transferred and stored in the data table to be processed is improved; in addition, the dump data records are stored again in a column type storage mode, and based on the advantages of the column type storage mode, complex query on the dump data is facilitated, and the query efficiency of the dump data is ensured. The embodiment does not need to split the data table to be processed into a plurality of branch tables, and solves the maintenance problem of the plurality of branch tables to a certain extent.
In the above-described embodiment or the following-described embodiment, it is necessary to select a data record to be transferred from at least one data record stored in a row-wise storage manner included in the data table to be processed. The data record to be dumped may be part or all of the at least one data record.
In an alternative embodiment, the analysis of the appeal of the service party finds that the service party often uses the data records in the data table to be processed which are stored recently, and the historical data are used less, so that the historical data records can be selected for unloading according to the storage time of the data records. Based on this, the step of selecting the data record to be transferred may be: and selecting the data record with the storage time meeting the preset time condition from at least one data record as the data record to be transferred. The preset time condition can be adaptively set according to specific application requirements. For example, the preset time condition may be a certain time point, such as 3 months and 31 days, and a data record with a storage time earlier than the time point may be selected from at least one data record as the data record to be dumped. For another example, the preset time condition may be a time range, such as 3 months to 10 months, and then the data record with the storage time within the time range may be selected from at least one data record as the data record to be dumped.
In the above optional implementation, for the business party, the storage logic does not need to be changed, the new data is still stored in a line type storage manner, the new data is frequently used, and generally can cover about 80% of query requests of the business party, the use of the new data generally includes operations such as insertion, update, query and the like, the line type storage manner is utilized to support the advantage of high concurrent query, the use requirement of the business party on the new data can be met, and the query efficiency is ensured; the business party has relatively less use of the historical data, about 20% of query requests of the business party can be covered generally, the use of the historical data generally comprises updating and query operations, the historical data is stored again in a column type storage mode, complex query of the historical data is supported, and the query efficiency of the historical data can be guaranteed.
In another alternative embodiment, the analysis of the appeal of the service party finds that the use of the data records by the service party can be distinguished by the primary keys of the data records, and the data records identified by some primary keys are often used, while the data records identified by some primary keys are used less frequently, so that the user can choose to use less data records for unloading according to the primary keys of the data records. Based on this, the step of selecting the data record to be transferred may be: and selecting the data record with the main key meeting the preset main key condition from at least one data record as the data record to be transferred and stored. The preset primary key condition may be adaptively set according to specific application requirements. For example, if the preset primary key condition may be a designated primary key, the data record identified by the designated primary key may be selected from at least one data record as the data record to be saved. For another example, if the preset primary key condition may be a primary key interval, the data record with the primary key located in the primary key interval may be selected from at least one data record as the to-be-saved data record.
In yet another optional embodiment, if the analysis of the appeal of the service party finds that the regularity of using the data records by the service party is not obvious, the access frequency of each data record may be counted, and a suitable data record may be selected for unloading according to the access frequency of the data record. Based on this, the step of selecting the data record to be transferred may be: and selecting the data record with the access frequency lower than the frequency threshold value from at least one data record as the data record to be transferred. The frequency threshold value can be adaptively set according to specific application requirements.
In the above embodiment or the following embodiments, the data records to be transferred need to be transferred to at least one storage node supporting the column storage mode in the column storage mode.
In an optional embodiment, the step of unloading the data record to be unloaded may be: and periodically transferring the data records stored in the row type storage mode in the data table to be processed to at least one storage node supporting the row type storage mode. Preferably, the unloading operation can be completed when the system is relatively idle, but is not limited thereto.
In an alternative embodiment, as shown in fig. 1c, an implementation flow of a data storage method includes:
11. and acquiring a data table to be processed from the storage nodes supporting the line type storage mode, wherein the data table to be processed comprises at least one data record stored in the line type storage mode.
12. And selecting the data record to be transferred from the at least one data record.
13. And dividing the data record to be transferred into at least one data record segment on the basis of a horizontal partition strategy, wherein the at least one data record segment is in one-to-one correspondence with at least one storage node supporting a columnar storage mode.
14. And respectively transferring at least one data recording segment to a corresponding storage node supporting the columnar storage mode in the columnar storage mode.
Reference is made to the description of the foregoing embodiments for steps 11 and 12, which are not described in detail herein.
The above steps 13-14 are actually implementation steps for transferring the data records to be transferred to the storage nodes supporting the column storage mode.
And step 13, mainly utilizing the advantage that the distributed storage system supports automatic horizontal data partition placement, horizontally dividing the data records to be transferred and stored so as to transfer the data records to be transferred and stored on the storage nodes of different support column type storage modes in a distributed manner, thereby realizing automatic horizontal data partition placement and providing high availability.
Step 14 is mainly used for respectively transferring the data recording segments divided in step 13 to the corresponding storage nodes supporting the columnar storage mode.
Further, as shown in FIG. 1d, one embodiment of step 14 includes:
141. and exporting at least one data record segment from the data table to be processed into at least one file.
142. And respectively importing the data recording segments in at least one file into corresponding storage nodes supporting the columnar storage mode.
143. In at least one storage node supporting a column type storage mode, corresponding data record segments are stored in a column type respectively.
In the above steps 141-143, in consideration of the unloading process of the data records across storage nodes, the data record segments to be unloaded are first exported to an external file, and then imported from the external file to the corresponding storage nodes supporting the columnar storage manner.
Wherein the data recording segments divided in step 13 can be exported to at least one file. For example, the data recording segments divided in step 13 may be exported into the same file, or the data recording segments divided in step 13 may be exported into different files, respectively; alternatively, the data recording segments divided in step 13 may be grouped, and the data recording segments in different groups may be exported to different files.
In the above embodiment, at least one storage node supporting the columnar storage mode includes: columnar storage nodes and/or hybrid storage nodes. The columnar storage node refers to a storage node supporting only a columnar storage mode. The hybrid storage node is a storage node supporting both the columnar storage mode and the row storage mode. Accordingly, the data records to be dumped may be dumped into the columnar storage nodes and/or the hybrid storage nodes.
In an alternative embodiment, a distributed storage system includes row-wise storage nodes and column-wise storage nodes. Accordingly, the data records to be destaged may be destaged to the at least one columnar storage node. Based on this, as shown in fig. 2a, a system diagram for unloading data records to be unloaded to a plurality of column-type storage nodes is shown. Based on the system shown in fig. 2a, an implementation flow of a data storage method is shown in fig. 2b, and includes:
201. and acquiring a data table to be processed from the line storage nodes, wherein the data table to be processed comprises at least one data record stored in a line storage mode.
202. And selecting the data record to be transferred from the at least one data record.
203. And dividing the data record to be transferred into a plurality of data record sections on the basis of a horizontal partition strategy, wherein the plurality of data record sections correspond to the plurality of column-type storage nodes one to one.
204. And exporting a plurality of data record segments from the data table to be processed into at least one file.
205. And respectively importing the data recording segments in at least one file into the corresponding column type storage nodes.
206. In a plurality of column-type storage nodes, corresponding data record segments are stored in columns respectively.
For steps 201 and 202, reference may be made to the description of the foregoing embodiments, which are not repeated herein.
The steps 203-206 are actually implemented steps for transferring the data records to be transferred to a plurality of column storage nodes.
Optionally, referring to fig. 2a, in consideration of the data volume of the data records to be transferred and the requirements of the business side, a plurality of columnar storage nodes may be configured, the data records to be transferred are distributed and transferred on different columnar storage nodes, and automatic horizontal data partition placement is supported, so as to provide high availability. The horizontal partition placement refers to the process of transferring the data records to be transferred to different column-type storage nodes by taking the data records as units.
Alternatively, referring to fig. 2a, the data table to be processed is disposed in a row-type storage node, where the row-type storage node is a storage node that stores data in a row-type storage manner. The lined storage nodes may employ a master-slave configuration to improve data security and query efficiency. One slave node is shown in fig. 2a, but more slave nodes may be deployed.
In another alternative embodiment, a distributed storage system includes a plurality of hybrid storage nodes, each hybrid storage node including a row-wise partition and a column-wise partition; columnar partitions are areas of the hybrid storage nodes where data is stored in columnar storage, and row-wise partitions are areas of the hybrid storage nodes where data is stored in row-wise storage. In this alternative embodiment, if part of the hybrid storage nodes may be used as storage nodes supporting the row-wise storage scheme, and part of the hybrid storage nodes may be used as storage nodes supporting the column-wise storage scheme, the data records to be transferred may be transferred from the row-wise partition of one hybrid storage node to the column-wise partitions of multiple hybrid storage nodes. Preferably, the unloading operation can be completed when the system is relatively idle, but is not limited thereto.
Based on the above, as shown in fig. 3a, a system diagram for transferring data records to be transferred from a row-wise partition of a hybrid storage node to a column-wise partition of a plurality of hybrid storage nodes is shown. Based on the system shown in fig. 3a, an implementation flow of a data storage method is shown in fig. 3b, and includes:
301. a data table to be processed is obtained from the line partitions of the first hybrid storage node, and the data table to be processed comprises at least one data record stored in a line storage mode.
302. And selecting the data record to be transferred from the at least one data record.
303. And dividing the data record to be transferred into a plurality of data record segments on the basis of a horizontal partition strategy, wherein the plurality of data record segments correspond to the column partitions of the plurality of second hybrid storage nodes one by one.
304. And respectively transferring the plurality of data recording segments to the columnar partitions of the corresponding second hybrid storage nodes.
Optionally, the plurality of second hybrid storage nodes do not include the first hybrid storage node. Then in step 304, a plurality of data record segments may be derived from the pending data table into at least one file; respectively importing the data recording segments in at least one file into the columnar partitions in the corresponding second hybrid storage nodes; and in the columnar partitions of the second mixed storage nodes, respectively storing the corresponding data record segments in a column mode.
Optionally, the plurality of second hybrid storage nodes include a first hybrid storage node. Then in step 304 the first hybrid storage node may be treated differently from other second hybrid storage nodes; for a first hybrid storage node, determining a data recording segment needing to be transferred to a column-type partition, and transferring the determined data recording segment from a line-type partition to the column-type partition, wherein the data recording segment belongs to a transfer process in the node; for a second hybrid storage node different from the first hybrid storage node, exporting a data record segment needing to be transferred from the to-be-processed data table to at least one file; respectively importing the data recording segments in at least one file into the columnar partitions in the corresponding second hybrid storage nodes; and in the column-wise partitions of the second mixed storage nodes, respectively storing the corresponding data record segments in a column manner.
Optionally, in the embodiment shown in fig. 3b, the second hybrid storage nodes each support a columnar storage manner, and a columnar partition thereof may already exist or may not yet be created. For the second hybrid storage node where columnar partitions do not exist, columnar partitions may be created first in step 304.
Optionally, if the columnar partition already exists in the second hybrid storage node, the corresponding data recording segment may be directly transferred to the columnar partition.
Based on the above, after the data records to be transferred are transferred to the column-wise partitions of the second hybrid storage nodes, the data records to be transferred need to be deleted from the row-wise partitions of the first hybrid storage nodes, so as to reduce the number of data records in the row-wise partitions. In addition, after the data records to be transferred are transferred to the column-wise partitions of the second hybrid storage nodes, the data in the column-wise partitions of the second hybrid storage nodes and the data in the row-wise partitions of the first hybrid storage nodes are changed, and the data ranges of the column-wise partitions of the second hybrid storage nodes and the data ranges of the row-wise partitions of the first hybrid storage nodes may be set. The data ranges indicate which data are in the respective columnar-wise partitions or row-wise partitions.
Optionally, referring to fig. 3a, the data records to be transferred are distributed and transferred to a plurality of hybrid storage nodes, the hybrid storage nodes adopt a shared-nothing architecture, and the data horizontal partitions are redundantly deployed on different nodes, so as to improve data security and query efficiency.
In the foregoing embodiment or the following embodiments, considering that the data records in the pending data table are all changed at the time, during the process of unloading the pending data records, an operation that causes the pending data records to be changed, which is referred to as a modification operation, for short, an insertion (insert), deletion (delete) or update (update) operation, may occur in the pending data table. In order to facilitate consistency of data records before and after the data records are transferred, in the process of transferring the data records to be transferred, change operations, such as insertion, deletion and/or update operations, of the data records to be transferred in the data table to be processed are recorded, and a storage node supporting a columnar storage mode corresponding to the change operations is determined. For example, on the basis of the divided data record segments, it may be determined to which data record segment the change operation is directed, and the storage node of the supported columnar storage manner corresponding to the data record segment may be used as the storage node of the supported columnar storage manner corresponding to the change operation.
After the data records to be transferred are transferred to the storage nodes supporting the columnar storage mode, a read lock can be added to the data table to be processed, and the data records in the data table to be processed are prevented from changing; in the storage node supporting the columnar storage mode corresponding to the change operation, playing back the recorded change operation to update the data record to be transferred and stored in the storage node supporting the columnar storage mode; and finally, releasing the read lock of the data table to be processed. By recording the change operation of the data record to be transferred and stored in the data table to be processed and playing back the data record to be transferred and stored in the storage node after transfer and storage, the consistency of the data record to be transferred and stored before and after transfer and the service quality based on the data record to be transferred and stored can be ensured.
Optionally, the record changing operation may be performed in a manner of: and creating a trigger (trigger) in the data table to be processed, and recording the change operation of the data record to be transferred and stored in the process of transferring and storing the data record to be transferred and stored by using the trigger. Triggers are one way in which databases provide programmers and data analysts with assurance of data integrity, which is a special stored procedure associated with an event in a data table, triggered by the event. For example, when an insert, delete or update operation is performed on the to-be-processed data table, a trigger is activated to record the operations in the to-be-processed data table.
Further, after the to-be-unloaded data records are unloaded, the to-be-unloaded data records also need to be deleted from the to-be-processed data table, so as to reduce the data records in the to-be-processed data table. In addition, after the to-be-transferred data record is transferred, the to-be-processed data table and the data range in the storage node supporting the columnar storage mode are all changed, so that the corresponding storage node can be accurately positioned for data query in the subsequent process, and the query efficiency is improved. The data range is used for representing the data in the storage nodes or the data table to be processed which support the columnar storage mode.
Based on the embodiments, the data records stored in the line storage mode in the to-be-processed data table can be transferred and stored in the line storage mode, so that the combination of the line storage mode and the line storage mode is realized, on one hand, the advantage that the line storage supports high concurrent query is fully utilized, the query efficiency of the data which is not transferred and stored is ensured, on the other hand, the complex query of the transfer data is facilitated based on the line storage, the query efficiency of the transfer data is ensured, and meanwhile, the maintenance problems of a plurality of branch tables are solved to a certain extent.
It also faces the problem for the business party to query the data stored in the distributed storage system. Based on the distributed storage system, the query may be across storage nodes, possibly on the same storage node. Based on this, the embodiment of the present application further provides a data query method, as shown in fig. 4a, the method includes:
401. and determining that the data to be inquired are simultaneously distributed in the storage nodes supporting the line type storage mode and the storage nodes supporting the column type storage mode according to the data range to which the data to be inquired belong.
402. And distributing the query request to the storage nodes supporting the line storage mode and the storage nodes supporting the column storage mode to perform parallel query.
403. And merging the query results returned by the storage nodes supporting the line storage mode and the storage nodes supporting the column storage mode.
404. And outputting the combined query result.
Optionally, if the data to be queried is distributed in the storage nodes supporting the row-type storage mode or the storage nodes supporting the column-type storage mode according to the data range to which the data to be queried belongs; the query request can be distributed to the storage nodes supporting the line storage mode or the storage nodes supporting the column storage mode for query; and outputting the query result returned by the storage nodes supporting the line storage mode or the storage nodes supporting the column storage mode.
For the distributed storage system shown in fig. 2a, if the data to be queried is located in the line storage node, the service side may query the data block storing the data record in the line storage node directly according to the index data block of the line storage node; in the row storage node, the data record is stored in a data block, and an index is set for the data block.
For the distributed storage system shown in fig. 2a, if the data to be queried is located in the columnar storage node, the service provider may determine the columnar storage node where the data to be queried is located based on a horizontal partition policy, and query the determined columnar storage node. In the columnar storage nodes, different compression strategies can be applied based on different field types, for example, the integer type can keep ordered storage so as to facilitate coding compression, and the character type can be directly queried by dictionary coding compression.
For the distributed storage system shown in fig. 2a, if data to be queried is scattered in the row-type storage nodes and the column-type storage nodes, which relates to query processing across the storage nodes, a consistent query entry can be provided by using the view, a business side does not need to distinguish the row-type storage nodes from the column-type storage nodes, a query request is distributed to the corresponding row-type storage nodes and column-type storage nodes through the view, and query results returned by the row-type storage nodes and the column-type storage nodes are merged and output to a user. The view is a virtual table, the content of which is defined by query, and like a real data table, the view contains a series of column data and row data with names. However, the views do not exist in the database as stored datasets, but are generated dynamically when the views are referenced.
For the distributed storage system shown in fig. 3a, if the data to be queried is located in the same partition (row-type partition or column-type partition) of the same hybrid storage node, the data table is logically seen by the business side, and all operations are still performed on the same data table without any change from the query operation in the prior art.
For the distributed storage system shown in fig. 3a, if the data to be queried is located in different partitions of the same hybrid storage node, the query in this case covers the row-type partitions and the column-type partitions, and cross-partition query processing is required. For example, the query process across partitions may be: and filtering out data records or column data according to the query conditions in the query request, and then performing subsequent mixed query processing and outputting a query result. Taking the subsequent hybrid query as an example of the join query, the data records may be filtered from the line-type partitions, columns that do not need to participate in the query in the data records are removed, then the join query is performed with the columns in the line-type partitions, and finally other return columns are merged based on the query result.
For the distributed storage system shown in fig. 3a, if the data to be queried is located in different hybrid storage nodes, this situation also involves query processing across the storage nodes, and the query request may be distributed to the corresponding hybrid storage nodes by using the partition information, and the query results returned by the different hybrid storage nodes are merged and output to the user.
It should be noted that the execution subjects of the steps of the methods provided in the above embodiments may be the same device, or different devices may be used as the execution subjects of the methods. For example, the execution subjects of steps 101 to 103 may be device a; for another example, the execution subject of steps 101 and 102 may be device a, and the execution subject of step 103 may be device B; and so on.
Fig. 4b is a schematic structural diagram of a data storage device according to yet another embodiment of the present application. As shown in fig. 4b, the apparatus comprises: an acquisition unit 41, a selection unit 42 and a dump unit 43.
An obtaining unit 41, configured to obtain a to-be-processed data table from a storage node supporting a line storage manner, where the to-be-processed data table includes at least one data record stored in the line storage manner.
A selecting unit 42, configured to select a data record to be transferred from the at least one data record.
And the unloading unit 43 is configured to unload the to-be-unloaded data record to at least one storage node supporting the columnar storage mode in the columnar storage mode.
It should be noted that, in this embodiment, one or more storage nodes may be provided to support the line storage mode. For example, if the to-be-processed data table is stored on a certain storage node supporting the line storage method, the obtaining unit 41 may obtain the to-be-processed data table from the storage node supporting the line storage method. For another example, if the to-be-processed data table is distributed and stored on a plurality of storage nodes supporting the line storage method, the obtaining unit 41 may obtain the to-be-processed data table from the plurality of storage nodes supporting the line storage method.
In an alternative embodiment, the selection unit 42 is specifically configured to perform at least one of the following operations:
selecting data records with the storage time meeting the preset time condition from at least one data record as data records to be transferred and stored;
selecting a data record with a main key meeting a preset main key condition from at least one data record as a data record to be transferred and stored;
and selecting the data record with the access frequency lower than the frequency threshold value from at least one data record as the data record to be transferred.
In an optional embodiment, the unloading unit 43 is specifically configured to:
dividing the data record to be transferred into at least one data record segment based on a horizontal partition strategy; the at least one data recording segment corresponds to the at least one storage node supporting the columnar storage mode one by one;
and respectively transferring the at least one data recording segment to a corresponding storage node supporting the columnar storage mode in the columnar storage mode.
Further, the unloading unit 43 is specifically configured to: exporting the at least one data recording segment from the to-be-processed data table to at least one file; respectively importing the data recording segments in the at least one file into corresponding storage nodes supporting a columnar storage mode; and in the at least one storage node supporting the columnar storage mode, respectively storing the corresponding data record segments in a column manner.
In an alternative embodiment, as shown in fig. 5, the apparatus further comprises: a recording unit 51 and a playback unit 52.
The recording unit 51 is configured to record, in the process of unloading the to-be-unloaded data record, a change operation of the to-be-unloaded data record in the to-be-processed data table, and determine a storage node supporting the columnar storage mode corresponding to the change operation.
The playback unit 52 is configured to add a read lock to the to-be-transferred data table after the to-be-transferred data record is successfully transferred, and play back the change operation in the storage node supporting the columnar storage manner corresponding to the change operation to update the to-be-transferred data record and release the read lock of the to-be-transferred data table.
In an alternative embodiment, as shown in fig. 5, the apparatus further comprises: a deletion unit 53 and a setting unit 54.
And the deleting unit 53 is configured to delete the to-be-unloaded data record from the to-be-processed data table after the to-be-unloaded data record is successfully unloaded.
And the setting unit 54 is used for setting at least one storage node supporting the column type storage mode and the data range of the data table to be processed after the data record to be transferred is successfully transferred.
In an optional embodiment, the storage node supporting the line storage comprises: a row-wise storage node and/or a hybrid storage node; the at least one storage node supporting the columnar storage mode comprises: columnar storage nodes and/or hybrid storage nodes.
The data storage device provided in this embodiment may be used to execute the processes of the above method embodiments, and details are not described herein.
According to the data storage device provided by the embodiment, partial data records are selected from at least one data record stored in a line type storage mode, and the partial data records are stored in a column type storage mode, so that the combination of the line type storage mode and the column type storage mode is realized, on one hand, the advantage that line type storage supports high concurrent query is fully utilized, the query efficiency of untransferred data is ensured, on the other hand, complicated query is conveniently carried out on the stored data based on the column type storage, the query efficiency of the stored data is ensured, and meanwhile, the maintenance problems of a plurality of branch tables are solved to a certain extent.
Fig. 6 is a schematic structural diagram of a data query device according to yet another embodiment of the present application. As shown in fig. 6, the apparatus includes: a determination unit 61, a transmission unit 62, a combining unit 63 and an output unit 64.
The determining unit 61 is configured to determine, according to a data range to which the data to be queried belongs, that the data to be queried is simultaneously distributed in storage nodes supporting a line-type storage manner and storage nodes supporting a column-type storage manner.
And a sending unit 62, configured to distribute the query request to the storage nodes supporting the line storage manner and the storage nodes supporting the column storage manner to perform parallel query.
And a merging unit 63, configured to merge query results returned by the storage nodes supporting the row-type storage system and the storage nodes supporting the column-type storage system.
And the output unit 64 is used for outputting the combined query result.
In an alternative embodiment, the determining unit 61 is further configured to: and determining that the data to be queried is distributed in storage nodes supporting a line type storage mode or storage nodes supporting a column type storage mode according to the data range to which the data to be queried belongs. Accordingly, the sending unit 62 is further configured to: and distributing the query request to the storage nodes supporting the line storage mode or the storage nodes supporting the column storage mode for query. Accordingly, the output unit 64 is further configured to: and outputting the query result returned by the storage nodes supporting the line storage mode or the storage nodes supporting the column storage mode.
In one possible design, the data storage apparatus may include a processor and a memory, the memory is used for storing a program that supports the data storage apparatus to execute the data storage method provided by the foregoing embodiment, and the processor is configured to execute the program stored in the memory, so as to: acquiring a data table to be processed from a storage node supporting a line type storage mode, wherein the data table to be processed comprises at least one data record stored in the line type storage mode; selecting a data record to be transferred from the at least one data record; and transferring the data record to be transferred to at least one storage node supporting the column type storage mode in a column type storage mode.
The memory may also be configured to store various other data to support operations on the data storage device. Examples of such data include instructions for any application or method operating on the data storage device, contact data, phonebook data, messages, pictures, videos, and the like.
The memory may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Optionally, the data storage device may further comprise a communication component for communicating the data storage device with other devices or a communication network.
The communication component is configured to facilitate wired or wireless communication between the data storage device and other devices. The data storage device may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
The embodiment of the present application further provides a computer storage medium, configured to store computer software instructions used by the data storage device, where the computer storage medium includes a program for executing the data storage method provided in the foregoing embodiment to the data storage device provided in the foregoing embodiment.
In a possible design, the above data query apparatus may include a processor and a memory, where the memory is used to store a program that supports the data query apparatus to execute the data query method provided in the above embodiments, and the processor is configured to execute the program stored in the memory to: determining that the data to be queried are simultaneously distributed in storage nodes supporting a line type storage mode and storage nodes supporting a column type storage mode according to a data range to which the data to be queried belong; distributing a query request to the storage nodes supporting the line storage mode and the storage nodes supporting the column storage mode to perform parallel query; merging the query results returned by the storage nodes supporting the row-type storage mode and the storage nodes supporting the column-type storage mode; and outputting the combined query result.
The memory may also be configured to store other various data to support operations on the data querying device. Examples of such data include instructions for any application or method operating on the data querying device, contact data, phonebook data, messages, pictures, videos, and the like.
The memory may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Optionally, the data query apparatus may further include a communication component, which is used for the data query apparatus to communicate with other devices or a communication network.
The communication component is configured to facilitate wired or wireless communication between the data query device and other devices. The data querying device may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
The embodiment of the present application further provides a computer storage medium, configured to store computer software instructions for the data query apparatus, where the computer software instructions include a program for executing the data query method provided in the foregoing embodiment to the data query apparatus provided in the foregoing embodiment.
The embodiment of the application discloses a1, a data storage method, comprising:
acquiring a data table to be processed from a storage node supporting a line type storage mode, wherein the data table to be processed comprises at least one data record stored in the line type storage mode;
selecting a data record to be transferred from the at least one data record;
and transferring the data record to be transferred to at least one storage node supporting the column type storage mode in a column type storage mode.
A2, the method as recited in a1, wherein the unloading step of the data records to be unloaded includes:
dividing the data record to be transferred into at least one data record segment based on a horizontal partition strategy; the at least one data recording segment corresponds to the at least one storage node supporting the columnar storage mode one by one;
and respectively transferring the at least one data recording segment to a corresponding storage node supporting the columnar storage mode in the columnar storage mode.
A3, the method of a2, wherein the unloading step of the at least one data recording segment includes:
exporting the at least one data recording segment from the to-be-processed data table to at least one file;
respectively importing the data recording segments in the at least one file into corresponding storage nodes supporting a columnar storage mode;
and in the at least one storage node supporting the columnar storage mode, respectively storing the corresponding data record segments in a column manner.
A4, the method of any one of a1-A3, wherein during the unloading of the data records to be unloaded, the method further comprises:
recording change operation aiming at the data record to be transferred and stored in the data table to be processed, and determining a storage node supporting a column type storage mode corresponding to the change operation;
after the data record to be transferred is successfully transferred, the method further comprises:
adding a reading lock to the data table to be processed;
playing back the change operation in a storage node which supports the columnar storage mode and corresponds to the change operation;
and releasing the read lock of the data table to be processed.
A5, the method of any one of a1-A3, wherein after successful destaging of the data record to be destaged, the method further comprises:
deleting the data record to be transferred from the data table to be processed;
and setting the at least one storage node supporting the columnar storage mode and the data range of the data table to be processed.
In the method of any of a6, a1-A3, the storage nodes supporting a line storage manner include: a row-wise storage node and/or a hybrid storage node;
the at least one storage node supporting the columnar storage mode comprises: columnar storage nodes and/or hybrid storage nodes.
The embodiment of the present application discloses B7, a data query method, including:
determining that the data to be queried are simultaneously distributed in storage nodes supporting a line type storage mode and storage nodes supporting a column type storage mode according to a data range to which the data to be queried belong;
distributing a query request to the storage nodes supporting the row-type storage mode and the storage nodes supporting the column-type storage mode for parallel query;
merging the query results returned by the storage nodes supporting the line storage mode and the storage nodes supporting the column storage mode;
and outputting the combined query result.
B8, the method according to B7, further comprising:
determining that the data to be queried is distributed in storage nodes supporting a line type storage mode or storage nodes supporting a column type storage mode according to a data range to which the data to be queried belongs;
distributing a query request to the storage nodes supporting the row-type storage mode or the storage nodes supporting the column-type storage mode for query;
and outputting the query result returned by the storage nodes supporting the line storage mode or the storage nodes supporting the column storage mode.
The embodiment of the present application further discloses C9, a data storage device, including:
the data processing method comprises the steps of obtaining a data table to be processed from storage nodes supporting a line type storage mode, wherein the data table to be processed comprises at least one data record stored in the line type storage mode;
the selection unit is used for selecting the data record to be transferred from the at least one data record;
and the unloading unit is used for unloading the data record to be unloaded to at least one storage node supporting the column type storage mode in a column type storage mode.
C10, the apparatus of C9, wherein the unloading unit is specifically configured to:
dividing the data record to be transferred into at least one data record segment based on a horizontal partition strategy; the at least one data recording segment corresponds to the at least one storage node supporting the columnar storage mode one by one;
and respectively transferring the at least one data recording segment to a corresponding storage node supporting the columnar storage mode in the columnar storage mode.
C11, the apparatus of C10, wherein the unloading unit is specifically configured to:
exporting the at least one data recording segment from the to-be-processed data table to at least one file;
respectively importing the data recording segments in the at least one file into corresponding storage nodes supporting a columnar storage mode;
and in the at least one storage node supporting the columnar storage mode, respectively storing the corresponding data record segments in a column manner.
The application also discloses D12, a data inquiry device, includes:
the device comprises a determining unit, a judging unit and a judging unit, wherein the determining unit is used for determining that data to be inquired are simultaneously distributed on storage nodes supporting a line type storage mode and storage nodes supporting a column type storage mode according to a data range to which the data to be inquired belong;
a sending unit, configured to distribute a query request to the storage nodes supporting the line storage manner and the storage nodes supporting the line storage manner to perform parallel query;
the merging unit is used for merging the query results returned by the storage nodes supporting the line storage mode and the storage nodes supporting the column storage mode;
and the output unit is used for outputting the combined query result.
D13, the apparatus as claimed in D12, wherein the determining unit is further configured to: determining that the data to be queried is distributed in storage nodes supporting a line type storage mode or storage nodes supporting a column type storage mode according to a data range to which the data to be queried belongs;
the sending unit is further configured to: distributing a query request to the storage nodes supporting the line storage mode or the storage nodes supporting the column storage mode for query;
the output unit is further configured to: and outputting the query result returned by the storage nodes supporting the row-type storage mode or the storage nodes supporting the column-type storage mode.
The embodiment of the application further discloses an E14, a distributed storage system, including: at least one storage node supporting a line storage mode, at least one storage node supporting a column storage mode and an access control device;
the at least one storage node supporting a line type storage mode is used for storing data in the line type storage mode;
the at least one storage node supporting the columnar storage mode is used for storing data in the columnar storage mode;
the access control device is used for acquiring a to-be-processed data table from the at least one storage node supporting the line storage mode, wherein the to-be-processed data table comprises at least one data record stored in the line storage mode; selecting a data record to be transferred from the at least one data record; and transferring the data records to be transferred to at least one storage node in the at least one storage node supporting the column storage mode in a column storage mode.
E15, the system of E14, wherein the access control device is further configured to:
determining that the data to be queried are simultaneously distributed in storage nodes supporting a line type storage mode and storage nodes supporting a column type storage mode according to a data range to which the data to be queried belong;
distributing a query request to the storage nodes supporting the row-type storage mode and the storage nodes supporting the column-type storage mode for parallel query;
merging the query results returned by the storage nodes supporting the row-type storage mode and the storage nodes supporting the column-type storage mode;
and outputting the combined query result.
E16, the system as described in E14 or E15, wherein the at least one storage node supporting a line storage manner is a line storage node; the at least one storage node supporting the columnar storage mode is a columnar storage node; or
The at least one storage node supporting the row-type storage mode and the at least one storage node supporting the column-type storage mode are both hybrid storage nodes.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both permanent and non-permanent, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (7)

1. A method of storing data, comprising:
acquiring a data table to be processed from a storage node supporting a line type storage mode, wherein the data table to be processed comprises at least one data record stored in the line type storage mode;
selecting a data record to be transferred from the at least one data record;
transferring the data records to be transferred to at least one storage node supporting the column storage mode in a column storage mode;
recording change operation aiming at the data record to be transferred in the data table to be processed, and determining a storage node which supports a column type storage mode and corresponds to the change operation;
after the data record to be transferred and stored is successfully transferred and stored, adding a reading lock to the data table to be processed, and replaying the change operation in a storage node which supports a column type storage mode and corresponds to the change operation;
releasing the read lock of the data table to be processed;
the unloading step of the data record to be unloaded comprises the following steps:
dividing the data record to be transferred into at least one data record segment based on a horizontal partition strategy; the at least one data recording segment corresponds to the at least one storage node supporting the columnar storage mode one by one; respectively transferring the at least one data recording segment to a corresponding storage node supporting a column storage mode in a column storage mode;
the unloading step of the at least one data recording segment comprises the following steps:
exporting the at least one data recording segment from the to-be-processed data table to at least one file; respectively importing the data recording segments in the at least one file into corresponding storage nodes supporting a columnar storage mode; and in the at least one storage node supporting the columnar storage mode, respectively storing the corresponding data record segments in a columnar manner.
2. The method of claim 1, wherein after successfully unloading the to-be-unloaded data record, the method further comprises:
deleting the data record to be transferred from the data table to be processed;
and setting the at least one storage node supporting the column type storage mode and the data range of the to-be-processed data table.
3. The method of claim 1, wherein the storage nodes that support a lined storage mode comprise: a row-wise storage node and/or a hybrid storage node;
the at least one storage node supporting the columnar storage mode comprises: a rank-wise storage node and/or a hybrid storage node.
4. A data storage device, comprising:
the data processing method comprises the steps of obtaining a data table to be processed from storage nodes supporting a line type storage mode, wherein the data table to be processed comprises at least one data record stored in the line type storage mode;
the selection unit is used for selecting the data record to be transferred from the at least one data record;
the unloading unit is used for unloading the data records to be unloaded to at least one storage node supporting the columnar storage mode in a columnar storage mode;
the recording unit is used for recording the change operation of the data record to be transferred in the data table to be processed in the process of transferring the data record to be transferred and determining a storage node which supports a column type storage mode and corresponds to the change operation;
the playback unit is used for adding a read lock to the data table to be processed after the data record to be transferred is successfully transferred and stored, and playing back the change operation in a storage node corresponding to the change operation and supporting a column type storage mode so as to update the data record to be transferred and stored and remove the read lock of the data table to be processed;
wherein, the unloading unit is specifically configured to:
dividing the data record to be transferred into at least one data record segment based on a horizontal partition strategy; the at least one data recording segment corresponds to the at least one storage node supporting the columnar storage mode one by one; respectively transferring the at least one data recording segment to a corresponding storage node supporting the columnar storage mode in a columnar storage mode;
the unloading and storing unit is specifically used for:
exporting the at least one data recording segment from the to-be-processed data table to at least one file; respectively importing the data recording segments in the at least one file into corresponding storage nodes supporting a columnar storage mode; and in the at least one storage node supporting the columnar storage mode, respectively storing the corresponding data record segments in a column manner.
5. A distributed storage system, comprising: at least one storage node supporting a line storage mode, at least one storage node supporting a column storage mode and an access control device;
the at least one storage node supporting a line type storage mode is used for storing data in the line type storage mode;
the at least one storage node supporting the columnar storage mode is used for storing data in the columnar storage mode;
the access control device is used for acquiring a to-be-processed data table from the at least one storage node supporting the line storage mode, wherein the to-be-processed data table comprises at least one data record stored in the line storage mode; selecting a data record to be transferred from the at least one data record; transferring the data records to be transferred to at least one storage node supporting the columnar storage mode in a columnar storage mode; recording change operation aiming at the data record to be transferred in the data table to be processed, and determining a storage node which supports a column type storage mode and corresponds to the change operation; after the data record to be transferred and stored is successfully transferred and stored, adding a reading lock to the data table to be processed, and replaying the change operation in a storage node which supports a column type storage mode and corresponds to the change operation; releasing the read lock of the data table to be processed;
the unloading step of the data record to be unloaded comprises the following steps: dividing the data record to be transferred and stored into a plurality of data record segments on the basis of a horizontal partition strategy, wherein the data record segments correspond to a plurality of columnar storage nodes one to one; exporting the plurality of data record segments from the to-be-processed data table to at least one file; respectively importing the data recording segments in the at least one file into corresponding column-type storage nodes; and in the plurality of column-type storage nodes, the corresponding data record segments are respectively stored in columns in a column-type storage mode.
6. The system of claim 5, wherein the access control device is further configured to:
determining that the data to be queried are simultaneously distributed in storage nodes supporting a line type storage mode and storage nodes supporting a column type storage mode according to a data range to which the data to be queried belong;
distributing a query request to the storage nodes supporting the line storage mode and the storage nodes supporting the column storage mode to perform parallel query;
merging the query results returned by the storage nodes supporting the line storage mode and the storage nodes supporting the column storage mode;
and outputting the combined query result.
7. The system according to claim 5 or 6, wherein the at least one storage node supporting a line storage manner is a line storage node; the at least one storage node supporting the columnar storage mode is a columnar storage node; or
The at least one row-capable storage node and the at least one column-capable storage node are both hybrid storage nodes.
CN201611237821.2A 2016-12-28 2016-12-28 Data storage method, device and system Active CN107092624B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611237821.2A CN107092624B (en) 2016-12-28 2016-12-28 Data storage method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611237821.2A CN107092624B (en) 2016-12-28 2016-12-28 Data storage method, device and system

Publications (2)

Publication Number Publication Date
CN107092624A CN107092624A (en) 2017-08-25
CN107092624B true CN107092624B (en) 2022-08-30

Family

ID=59646067

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611237821.2A Active CN107092624B (en) 2016-12-28 2016-12-28 Data storage method, device and system

Country Status (1)

Country Link
CN (1) CN107092624B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062378B (en) * 2017-12-12 2018-12-11 清华大学 The Connection inquiring method and system of more time serieses under a kind of storage of column
CN108093047B (en) * 2017-12-15 2021-07-27 北京星选科技有限公司 Data sending method and device, electronic equipment and middleware system
CN110196847A (en) * 2018-08-16 2019-09-03 腾讯科技(深圳)有限公司 Data processing method and device, storage medium and electronic device
CN109582682B (en) * 2018-11-02 2024-04-09 中国平安人寿保险股份有限公司 Data processing method and device, storage medium and computer equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102495905A (en) * 2011-12-23 2012-06-13 天津神舟通用数据技术有限公司 Packing method based on line storage database engine
CN102609492A (en) * 2012-01-21 2012-07-25 东华大学 Metadata management method supporting variable table modes
CN105488231A (en) * 2016-01-22 2016-04-13 杭州电子科技大学 Self-adaption table dimension division based big data processing method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE3312161C1 (en) * 1983-04-02 1984-09-13 Telefonbau Und Normalzeit Gmbh, 6000 Frankfurt Method for controlling the establishment and termination of a connection in a time-multiplexed telecommunications, in particular a telephone exchange
US8768927B2 (en) * 2011-12-22 2014-07-01 Sap Ag Hybrid database table stored as both row and column store
CN104424287B (en) * 2013-08-30 2019-06-07 深圳市腾讯计算机***有限公司 Data query method and apparatus
CN104750727B (en) * 2013-12-30 2019-03-26 沈阳亿阳计算机技术有限责任公司 A kind of column memory storage inquiry unit and column memory storage querying method
WO2015139193A1 (en) * 2014-03-18 2015-09-24 华为技术有限公司 Method and apparatus for conversion of data storage formats

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102495905A (en) * 2011-12-23 2012-06-13 天津神舟通用数据技术有限公司 Packing method based on line storage database engine
CN102609492A (en) * 2012-01-21 2012-07-25 东华大学 Metadata management method supporting variable table modes
CN105488231A (en) * 2016-01-22 2016-04-13 杭州电子科技大学 Self-adaption table dimension division based big data processing method

Also Published As

Publication number Publication date
CN107092624A (en) 2017-08-25

Similar Documents

Publication Publication Date Title
CN107092624B (en) Data storage method, device and system
CN101315628B (en) Internal memory database system and method and device for implementing internal memory data base
KR102564170B1 (en) Method and device for storing data object, and computer readable storage medium having a computer program using the same
CN106484906B (en) Distributed object storage system flash-back method and device
CN102495894A (en) Method, device and system for searching repeated data
EP3944556B1 (en) Block data storage method and apparatus, and block data access method and apparatus
CN110297869B (en) AI data warehouse platform and operation method
CN104281535B (en) A kind for the treatment of method and apparatus of mapping table in internal memory
US6745198B1 (en) Parallel spatial join index
CN104598652B (en) A kind of data base query method and device
US11829377B2 (en) Efficient storage method for time series data
CN116894041B (en) Data storage method, device, computer equipment and medium
CN116756253B (en) Data storage and query methods, devices, equipment and media of relational database
KR20120082176A (en) Data processing method of database management system and system thereof
CN112464049B (en) Method, device and equipment for downloading number detail list
CN117171209A (en) Cache data cleaning method and device, storage medium and electronic equipment
CN114461635A (en) MySQL database data storage method and device and electronic equipment
CN108415982B (en) Database processing method and device
CN108073596B (en) Data deletion method and device for OLAP database
US10169250B2 (en) Method and apparatus method and apparatus for controlling access to a hash-based disk
CN112131226A (en) Index obtaining method, data query method and related device
CN111274410A (en) Data storage method and device and data query method and device
CN110968587A (en) Data processing method and device
CN117216059A (en) Data table merging method, device, equipment and medium
CN113190563B (en) Index generation method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: No. 222, second floor, building 12, No. 27, Jiancai Chengzhong Road, Haidian District, Beijing 100089

Applicant after: Beijing Xingxuan Technology Co.,Ltd.

Address before: Room 202, 2 floors, 1-3 floors, No. 11 Shangdi Information Road, Haidian District, Beijing 100085

Applicant before: Beijing Xiaodu Information Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant