CN112732711B

CN112732711B - Data storage method and device and electronic equipment

Info

Publication number: CN112732711B
Application number: CN202011582167.5A
Authority: CN
Inventors: 邱海港
Original assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Current assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2024-06-04
Anticipated expiration: 2040-12-28
Also published as: CN112732711A

Abstract

The embodiment of the invention provides a data storage method, a data storage device and electronic equipment, and relates to the technical field of data storage. The method comprises the following steps: determining each target field in the target data table, which is used as a query field for data query, and each target field is used as a query frequency corresponding to the query field of the designated query type within a preset time period; based on the determined query frequency, selecting the fragment fields from the target fields according to a preset fragment field screening mode; the fragment field screening mode comprises a mode of taking the corresponding query frequency when each target field is taken as a query field aiming at a designated query type as a screening basis and carrying out field screening; and storing each piece of data belonging to the target data table in a slicing way by utilizing the slicing field. Compared with the prior art, the scheme provided by the embodiment of the invention can reduce the occurrence probability of the cross-node query condition in the data query process and reduce the resource waste in the query process.

Description

Data storage method and device and electronic equipment

Technical Field

The present invention relates to the field of data storage technologies, and in particular, to a data storage method, a data storage device, and an electronic device.

Background

Currently, with the continuous development of the distributed storage technology, the use of the distributed storage technology to store the data table is widely used in more and more technical fields.

When the distributed storage technology is used for storing data, the data table can be divided into a plurality of fragments, and different fragments are located in different storage nodes. In order to realize the slicing storage of the data table, the preset slicing field, namely, the slicing key, may be used to store each piece of data of the data table into each storage node, so as to realize the slicing storage of the data table. Therefore, when data query is performed, only one storage node can be queried, so that full-node scanning is avoided, and resources consumed in the query process are saved.

In the related art, the above-mentioned fragment field is generally determined by a technician such as a DBA (Database Administrator, database manager), an application developer, etc. according to his own experience about data query.

However, in the related art, due to insufficient experience of a technician and the like, after each piece of data to be stored is stored according to the set fragmentation field, in a subsequent query process, the occurrence probability of the cross-node query condition is high, so that resource waste is caused in the query process.

Disclosure of Invention

The embodiment of the invention aims to provide a data storage method, a data storage device and electronic equipment, so that the occurrence probability of a cross-node query condition is reduced in the data query process, and the resource waste in the query process is reduced. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides a data storage method, where the method includes:

determining each target field in the target data table, which is used as a query field for data query, and each target field is used as a query frequency corresponding to the query field of the designated query type within a preset time period;

Based on the determined query frequency, selecting the fragment fields from the target fields according to a preset fragment field screening mode; the fragment field screening mode comprises a mode of taking the corresponding query frequency when each target field is used as a query field aiming at a designated query type as a screening basis and carrying out field screening;

and storing each piece of data belonging to the target data table in a slicing way by utilizing the slicing field.

Optionally, in a specific implementation manner, the designated query type includes: equivalent queries and/or range queries.

Optionally, in a specific implementation manner, the designated query type includes: equivalent query;

the step of screening the fragment fields from each target field according to a predetermined fragment field screening mode based on the determined query frequency comprises the following steps:

and determining a field with highest query frequency corresponding to the target field which is a query field aiming at the equivalent query as a fragment field.

Optionally, in a specific implementation manner, the method for screening the fragment field further includes a method for screening the field by taking the condition meeting the data equilibrium as a screening basis; wherein the storage equalization condition includes: when the data entry is used as a fragmentation field, the number of data entries in each sub-data table obtained by fragmentation can be enabled to meet a preset balance condition.

Optionally, in a specific implementation manner, the step of determining, as the fragment field, a field with the highest query frequency corresponding to the target field, which is a query field for the equivalent query, includes:

and determining a field with highest query frequency in target fields which are used as query fields aiming at equivalent queries and meet the data balance condition, and taking the field with highest query frequency as a fragment field.

Optionally, in a specific implementation manner, the designated query type includes: equivalent query and range query;

Determining a first field with highest corresponding query frequency in all target fields which are query fields aiming at equivalent queries, and a second field with highest corresponding query frequency in all target fields which are query fields aiming at range queries;

If the second frequency is greater than the first frequency and the difference value between the second frequency and the first frequency exceeds a preset difference value threshold, determining the second field as a fragment field; the second frequency is the query frequency corresponding to the second field, and the first frequency is the query frequency corresponding to the first field;

Otherwise, the first field is determined to be a fragment field.

Optionally, in a specific implementation manner, the method for screening the fragment field further includes a method for screening the field by taking the condition meeting the data equilibrium as a screening basis;

Wherein the storage equalization condition includes: when the data entry is used as a fragmentation field, the number of data entries in each sub-data table obtained by fragmentation can be enabled to meet a preset balance condition.

Optionally, in a specific implementation manner, the step of determining, from among the respective target fields that are query fields for equivalent queries, a first field that corresponds to a query with a highest frequency, and, from among the respective target fields that are query fields for range queries, a second field that corresponds to a query with a highest frequency, includes:

And determining a first field with highest corresponding query frequency in all target fields which are used as query fields aiming at equivalent queries and meet the data balancing condition, and a second field with highest corresponding query frequency in all target fields which are used as query fields aiming at range queries and meet the data balancing condition.

Optionally, in a specific implementation manner, the step of storing, in a slicing manner, each piece of data belonging to the target data table by using the target slicing field includes:

backing up each piece of data in the target data table to obtain a backup data table, and backing up the table structure of the target data table;

Deleting the target data table, and constructing each sub data table of the target data table according to the backed-up table structure; wherein, each sub data table is stored in different storage nodes respectively;

And determining a storage node to which each piece of data belongs according to the field value of the fragment field in the piece of data aiming at each piece of data in the backup data table, and writing the piece of data into a sub data table in the storage node.

In a second aspect, an embodiment of the present invention provides a data storage device, the device including:

the information determining module is used for determining each target field which is used as a query field to perform data query in the target data table within a preset time length, and each target field is used as a query frequency corresponding to the query field of a designated query type;

The field screening module is used for screening the fragment fields from all target fields according to a preset fragment field screening mode based on the determined query frequency; the fragment field screening mode comprises a mode of carrying out field screening by taking query frequency corresponding to each target field as a query field aiming at a designated query type as a screening basis;

and the data storage module is used for storing each piece of data belonging to the target data table in a slicing way by utilizing the slicing field.

Optionally, in a specific implementation manner, the designated query type includes: equivalent query; the field screening module comprises:

And the field screening sub-module is used for determining the field with the highest query frequency corresponding to the target field serving as the query field aiming at the equivalent query as the fragment field.

Optionally, in a specific implementation manner, the field filtering submodule is specifically configured to:

Optionally, in a specific implementation manner, the designated query type includes: equivalent query and range query; the field screening module comprises:

The field determination submodule is used for determining a first field with highest corresponding query frequency in all target fields which are query fields aiming at equivalent queries and a second field with highest corresponding query frequency in all target fields which are query fields aiming at range queries; if the second frequency is larger than the first frequency and the difference value between the second frequency and the first frequency exceeds a preset difference value threshold, triggering a first screening sub-module, otherwise, triggering a second screening sub-module; the second frequency is the query frequency corresponding to the second field, and the first frequency is the query frequency corresponding to the first field;

The first screening submodule is used for determining the second field as a fragment field;

the second screening submodule is used for determining the first field as a fragmentation field.

Optionally, in a specific implementation manner, the field determination submodule is specifically configured to:

Optionally, in a specific implementation manner, the data storage module is specifically configured to:

In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

A memory for storing a computer program;

And the processor is used for realizing the steps of any data storage method provided by the embodiment of the invention when executing the program stored in the memory.

In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium, where a computer program is stored, where the computer program is executed by a processor to implement the steps of any of the data storage methods provided in the embodiments of the present invention.

In a fifth aspect, embodiments of the present invention provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the steps of any of the data storage methods provided by the embodiments of the present invention described above.

The embodiment of the invention has the beneficial effects that:

By applying the scheme provided by the embodiment of the invention, aiming at the target data table, each target field which is used as the query field in the target data table for data query in a preset time length can be determined, and each target field is used as the corresponding query frequency when aiming at the query field of the designated query type. Thus, based on the determined query frequency, the method can utilize the corresponding query frequency when each target field is used as the query field aiming at the designated query type as a screening basis to screen the segmented fields from the target fields in a segmented field screening mode of field screening, and therefore the segmented fields obtained by screening are utilized to store the segmented data belonging to the target data table.

As can be seen from the above, when the scheme provided by the embodiment of the present invention is applied to the slicing storage of each piece of data belonging to the target data table, the utilized slicing fields are determined according to the query frequency corresponding to each target field when each target field is used as the query field for the specified query type. Thus, the determined shard field is matched with a query field for the specified query type utilized in querying the target data table. Therefore, the probability that the fragment field of the target data table is used as the query field is higher, so that the probability that the storage node where the content to be queried is located is determined to be improved through the field value of the query field in the query process.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a data storage method according to an embodiment of the present invention;

FIG. 2 is a flow chart of one implementation of S103 in FIG. 1;

FIG. 3 is a flow chart of one implementation of S101 in FIG. 1;

FIG. 4 is a schematic diagram of a data storage device according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In storing a data table using a distributed storage technique, in the related art, a fragment field for dividing the data table is generally determined by a technician such as a DBA, an application developer, or the like according to his own experience with data query. However, in the related art, due to insufficient experience of a technician and the like, after each piece of data to be stored is stored according to the set fragmentation field, in a subsequent query process, the occurrence probability of the cross-node query condition is high, so that resource waste is caused in the query process.

For example, a table of performance of a class student is stored, where, as shown in table 1, the fields included in the table of performance are: number, name, math score, chinese score, english score, and total score. And the number is used as a slicing field, and each piece of data in the score table is sliced and stored into three storage nodes.

TABLE 1

Number of school	Name of name	Mathematical performance	Chinese score	English achievements	Total achievement
						001	Sheet x	99	85	79	263
002	King x	100	95	98	293
						003	Liu x	55	45	22	122
……	……	……	……	……	……
						111	Plum x	95	89	93	277

Thus, when the number is used as the query field to query the data, for example, the achievement of the student with the number 001 is queried, the storage node where the data item corresponding to the student with the number is located can be directly determined according to the field value of the query field, that is, according to the specific numerical value of the queried number, so that the student item corresponding to the student can be directly queried in the storage node.

However, in more cases, it is common to query the total score as a query field, and by way of example, to query students whose total score is in the [290,300] interval, to query students whose total score is in the [0.180] interval, and so on. Therefore, since the storage nodes where the data items corresponding to the students located in a certain interval cannot be directly determined, the three storage nodes need to be traversed to obtain the data query result. Obviously, in the process, the occurrence probability of the cross-node query condition is higher, so that the resource waste in the query process is caused.

In order to solve the technical problems, the embodiment of the invention provides a data storage method.

The data storage method is suitable for any application scene for storing the data table by using a distributed storage technology; moreover, the data storage method can be applied to an electronic device capable of running the distributed storage technology and storing each piece of data in the data table into each storage node, for example, the electronic device can be a control node in a distributed cluster utilized by the distributed storage technology, and the like, which is reasonable. In this regard, the embodiment of the present invention does not limit the application scenario and execution subject of the data storage method.

The data storage method may include the steps of:

Based on the determined query frequency, selecting the fragment fields from the target fields according to a preset fragment field screening mode; the fragment field screening mode comprises a mode of taking each target field as a screening basis for the query frequency corresponding to the query field of the designated query type and carrying out field screening;

The following describes a data storage method according to an embodiment of the present invention in detail with reference to the accompanying drawings.

Fig. 1 is a flow chart of a data storage method according to an embodiment of the present invention. As shown in fig. 1, the data storage method may include the steps of:

s101: determining each target field in the target data table, which is used as a query field for data query, and each target field is used as a query frequency corresponding to the query field of the designated query type within a preset time period;

for the target data table, it may be determined that each of the fields included in the target data table is each target field for data query as a query field within a predetermined period of time.

The predetermined time length may be set according to an empirical value and a requirement of reducing occurrence probability of a cross-node query condition in practical application, for example, may be one hour, one day, one week, etc., and the specific numerical value of the predetermined time length is not limited in this embodiment of the present invention.

Moreover, since the same target field can be used as a query field for different designated query types in different query processes, each target field can be determined as a query frequency corresponding to the query field for the designated query type.

For example, for the target data table shown in the above table 1, when the student information of "total score is 290 points" is queried, the query field is "total score", and the specified query type to be queried is the equivalent query, that is, at this time, "total score" is the query field for the equivalent query; further, when student information of "total score is located within [290,300] section" is queried, the query field is still "total score", and the designated query type to be targeted is a range query, that is, at this time, "total score" is a query field as a query for a range query. Obviously, while "total results" are utilized as query fields, the specific query types targeted are different in different query processes.

Optionally, in a specific implementation manner, the query types may include: equivalent queries and/or range queries.

The above step S101 may be performed in a variety of manners, and the embodiment of the present invention is not limited to the specific manner in which the step S101 is implemented in the following description for clarity of the text.

S102: based on the determined query frequency, selecting the fragment fields from the target fields according to a preset fragment field screening mode;

the method for screening the fragmented fields comprises a method for screening the fields by taking the corresponding query frequency as a screening basis when each target field is taken as a query field aiming at a designated query type.

After obtaining the respective target fields and the query frequency corresponding to each target field when the target field is the query field for the specified query type, since the same target field may be used in different specified query types, the same target field may correspond to a plurality of query frequencies for different specified query types, and therefore, based on the determined query frequencies, the fragment fields may be selected from the respective target fields by taking the query frequency corresponding to each target field when the target field is the query field for the specified type as a screening basis.

In addition, when the fragmented fields are screened, besides taking each target field as the query frequency corresponding to the query field of the designated query type as the screening basis, other screening conditions can be added, for example, considering that each piece of data in the target data table can be uniformly stored in each storage node, when the fragmented fields are screened, the data can be simultaneously uniformly stored as other screening conditions.

Based on this, optionally, in a specific implementation manner, the above-mentioned fragment field screening manner may further include: taking the data equilibrium condition as a screening basis, and carrying out field screening; wherein the storage equalization condition includes: when the data entry is used as a fragmentation field, the number of data entries in each sub-data table obtained by fragmentation can be enabled to meet a preset balance condition.

The step S102 may be performed in various manners, which is not specifically limited in the embodiment of the present invention.

For example, the target field corresponding to the highest query frequency among the determined query frequencies may be used as the fragment field.

For clarity, the implementation of step S102 will be described later.

S103: and storing each piece of data belonging to the target data table in a slicing way by utilizing the slicing field.

Since each piece of data belonging to the target data table has been stored in each storage node in a fragmented manner using the existing fragment field, the fragment field screened in the step S102 is the fragment field for re-storing each piece of data belonging to the target data table in a fragmented manner.

When the screened segmented fields are utilized to store each piece of data of the target data table in a re-segmented manner, the field value of the segmented field of each piece of data and the set routing rule in the target data table are utilized to determine the storage node to which each piece of data needs to be stored, so that each piece of data is stored in the determined storage node.

Wherein, optionally, the routing rule can be determined by using the screened fragment field and the number of each storage node.

For example, the target data table is table 1, the number of the fragment fields obtained by screening is the number of the chips, and the number of the storage nodes is 3, then the specific numerical value of each chip may be subjected to Hash (Hash) transformation to obtain the Hash value of each chip, and then the remainder obtained by dividing the Hash value of each serial number by 3 is calculated, so that the following routing rule may be predetermined: storing the data entry where the number of the number with the remainder of zero is located in the storage node 1, storing the data entry where the number of the number with the remainder of 1 is located in the storage node 2, the data entry where the number of the remainder of 2 is located is stored to the storage node 3.

For example, if the target data table is table 1, the number of the segment fields obtained by the screening is 10, and the section where the total score is located may be divided into 10 subintervals according to a specified rule, and thus each storage node corresponds to one subinterval. Illustratively, the interval where the total score is located is [100,300], then the interval [100,300] can be divided into 10 sub-intervals as follows:

[100,150),[150,180),[180,210),[210,240),[240,250),[250,260),[260,270),[270,280),[280,290),[290,300]; Further, the following routing rules may be predetermined: storing the data entry of the total achievement with the numerical value in the interval [100, 150 ] to the storage node 0; storing the data entry of the total achievement with the numerical value in the interval [150,180 ] in the storage node 1; storing the data entry of the total achievement with the value in the interval 180, 210) in the storage node 2; storing the data entry in which the total achievement with the value in the interval 210, 240) is located in the storage node 3; storing the data entry in which the total achievement with the value in the interval 240, 250) is located in the storage node 4; storing the data entry of the total achievement with the value in the interval [250,260 ] in the storage node 5; storing data entries in which the total achievement with values in intervals [260,270 ] is located in the storage node 6; storing the data entry in which the total achievement with the value in the interval [270,280 ] is located in the storage node 7; storing the data entry in which the total achievement with the value in interval 280, 290) is located in the storage node 8; the data entry in which the total achievement with the value lying in the interval 290,300 is stored in the storage node 9.

After executing the step S103, the fragmentation of each piece of data belonging to the target data table is performed again by using the fragment field obtained by filtering in the step S102, and the obtained fragmentation result may be different from the fragmentation result of each piece of data belonging to the target data table when the step S103 is not executed.

For example, with respect to the target data table shown in the above table 1, each piece of data in the target data table is already stored in each storage node in a sliced manner according to the existing slicing field "number", and after the step S102 is performed and the sliced field "score" obtained by the screening is used, each piece of data in the above table 1 may be stored again in a sliced manner using the "score" as the sliced field. The result of the slicing storage of the above table 1 after the re-slicing storage may be different from the result of the slicing storage of the above table 1 when the re-slicing storage is not performed.

Optionally, in a specific implementation manner, as shown in fig. 2, the step S103 may include the following steps S1031 to S1033;

Step S1031: backing up each piece of data in the target data table to obtain a backup data table, and backing up the table structure of the target data table;

step S1032: deleting the target data table, and constructing each sub data table of the target data table according to the backed-up table structure;

wherein, each sub data table is stored in different storage nodes respectively;

Step S1033: and determining a storage node to which each piece of data belongs according to the field value of the target fragment field in the piece of data aiming at each piece of data in the backup data table, and writing the piece of data into a sub data table in the storage node.

In this embodiment, after the fragment field is obtained by screening, each piece of data in the target data table may be backed up, so as to obtain a backup data table, and a table structure of the target data table may be backed up.

Since each piece of data in the target data table is stored in a plurality of storage nodes, each piece of data in the backup target data table is: and merging and backing up the sub data tables for storing the data belonging to the target data table in each storage node into the same data table to obtain a backup data table.

In this way, the target data table is deleted, that is, the sub-data table in each storage node for storing each piece of data belonging to the target data table is deleted, and a new sub-data table for storing each piece of data belonging to the target data table is constructed in each target data table.

Further, after the segment fields are obtained through screening, a routing rule can be established, so that a corresponding relationship between field values of the segment fields and each storage node is established. Thus, for each piece of data in the backup data table, the storage node to which the piece of data belongs can be determined based on the field value of the fragment field in the piece of data, and then the piece of data is written into the sub data table in the storage node.

In addition, optionally, after the fragment field is screened, for a newly added data entry in the target data table, that is, a newly acquired data entry that is not yet stored in any storage node, the field value of the fragment field of the newly added data entry and the set routing rule may be used to determine the storage node to which the newly added data entry needs to be stored, so that the newly added data entry is stored in the determined storage node.

In addition, optionally, after the data belonging to the target data table is stored in a sliced manner by using the sliced field, the distribution condition of the data belonging to the target data table in each storage node may be output, so that according to the distribution condition, it may be determined whether the determined sliced field can realize uniform storage of the data belonging to the target data table in each storage node.

In this way, after each piece of data in the target data table is re-fragmented and stored to each storage node, when the query request is received again, since the fragment field utilized for re-fragmented storage of each piece of data in the target data table is determined according to the query frequency corresponding to each target field as the query field for the specified query type, the probability that the query field included in the query request is the determined fragment field is high, and therefore, when the query field included in the query request is the fragment field, the storage node where the data item to be queried by the query request is located can be determined according to the field value of the query field and the predetermined routing rule, and thus, the data item to be queried is searched in each piece of data belonging to the target data table stored in the storage node.

Optionally, in a specific implementation manner, as shown in fig. 3, the step S101, determining each target field in the target data table, which is used as a query field to perform the data query, and each target field, as a query frequency corresponding to the query field of the specified query type, may include the following steps S1011-S1012:

step S1011: acquiring a query request aiming at a target data table within a preset time length;

step S1012: for the acquired query request, counting each target field in the target data table, which is used as a query field for data query, in a preset time period, and using each target field as a query frequency corresponding to the query field of the designated query type.

In this specific implementation, a statistical table about the correspondence of the target field, the query type, and the query frequency may be constructed in advance.

For example, as shown in table 2, the statistics table may include the following:

TABLE 2

Illustratively, in the sql (Structured Query Language ) language, in Table 2 above:

The table names are as follows: a table operating in the sql statement;

The field names are as follows: fields in the sql statement that are used under the where condition;

the query frequency is: the number of queries in the sql statement with the field used in the where condition;

the query types are: fields used in the where condition, query symbols used in the where condition.

Further, optionally, in some cases, considering that the data entries in the target data table may be uniformly stored in each storage node, as shown in table 3, the constructed statistics table may further include the data amount.

TABLE 3 Table 3

After the statistics table is constructed, the query request for the target data table within a predetermined time period can be obtained, so that each target field in the target data table, which is used as a query field for data query, can be counted for the obtained query request within the predetermined time period by means of the constructed statistics table, and each target field is used as a query frequency corresponding to the query field for the designated query type.

Wherein, for each obtained query request, the query field and the query type included in the query request can be determined, so that the constructed statistical table is updated according to the determined result.

For example, in the sql language, a statistical module may be started in the sql parsing layer to parse the obtained sql statement to obtain a parsing result about the where condition in the sql statement, and the parsing result is recorded by using the constructed statistical table.

Thus, at the end of the predetermined period of time, the above-mentioned statistical table updated with each obtained query request can be obtained, and thus, each target field of the target data table, which is used as a query field for data query, can be determined from the statistical table, and each target field serves as a query frequency corresponding to the query field for the specified query type.

In this way, the determined fragment field can be matched with the query field for the designated query type, which is used when the target data table is queried, by determining the fragment field used when the target data table is stored in a re-fragmentation manner according to the statistical result of the query request for the target data table in the practical application. Therefore, the probability that the fragment field of the target data table is used as the query field is higher, so that the probability that the storage node where the content to be queried is located is determined by the field value of the query field in the query process is improved, the occurrence probability of the cross-node query condition can be reduced, and the resource waste in the query process is reduced.

For example, at the end of the predetermined time period, the statistics obtained are shown in Table 4.

TABLE 4 Table 4

Table name	Field name	Query frequency	Query types	Data quantity
					T1	F1	100	Equivalent query	A
T1	F2	1000	Equivalent query	B
					T1	F3	500	Equivalent query	C
T2	F1	100	Equivalent query	D
					T2	F2	5000	Range query	E
T2	F3	1000	Equivalent query

Therein, according to table 4 above, it can be seen that:

For the target data table T1, it may be determined that each target field that is used as a query field for data query is F1, F2, and F3 within a predetermined period of time, where F1 corresponds to a query frequency of 100 when used as a query field for equivalent query, F2 corresponds to a query frequency of 1000 when used as a query field for equivalent query, and F3 corresponds to a query frequency of 500 when used as a query field for equivalent query.

For the target data table T2, it may be determined that each target field that is used as a query field for data query is F1, F2, and F3 within a predetermined period of time, where F1 corresponds to a query frequency of 100 when used as a query field for equivalent query, F2 corresponds to a query frequency of 5000 when used as a query field for range query, and F3 corresponds to a query frequency of 1000 when used as a query field for equivalent query.

The specific implementation of screening the fragment fields from each target field according to a predetermined fragment field screening manner based on the determined query frequency is described below in connection with different designated query types.

Optionally, in a first specific implementation manner, the specified query type may include an equivalent query, and in this specific implementation manner, step S102, based on the determined query frequency, screens a fragment field from each target field according to a predetermined fragment field screening manner, may include the following step 1021:

Step 1021: and determining a field with highest query frequency corresponding to the target field which is a query field aiming at the equivalent query as a fragment field.

In this embodiment, the target field that is the query field for the equivalent query may be determined from the determined target fields, and further, the field with the highest query frequency that corresponds to the determined target field that is the query field for the equivalent query may be used as the fragment field.

In this way, the field with the highest query frequency among the target fields as the query fields for the equivalent query is determined as the sharded field used when the target data table is stored in a repartitioned manner, so that the determined sharded field can be matched with the query field for the specified query type used when the target data table is queried. Therefore, the probability that the fragment field of the target data table is used as the query field is higher, so that the probability that the storage node where the content to be queried is located is determined by the field value of the query field in the query process is improved, the occurrence probability of the cross-node query condition can be reduced, and the resource waste in the query process is reduced.

For example, for the table 4, for the target data table T1, the target fields that are the query fields for the equivalent query are F1, F2, and F3, where F2 may be the fragment field if the query frequency corresponding to F2 is the highest; for the target data table T2, the target fields that are the query fields for the equivalent query are F1 and F3, where F3 may be used as the fragment field if the query frequency corresponding to F3 is the highest.

In this embodiment, regarding table 4, even if the query frequency corresponding to F2 is highest for the target data table T2, F2 is not considered as a fragment field because the query type for F2 is a range query.

Optionally, in a second specific implementation manner, the specified query type may include an equivalent query and a range query, and in this specific implementation manner, step S102, based on the determined query frequency, screens the fragment field from each target field according to a predetermined fragment field screening manner, may include the following steps 1022 to 1024:

Step 1022: determining a first field with highest corresponding query frequency in all target fields which are query fields aiming at equivalent queries, and a second field with highest corresponding query frequency in all target fields which are query fields aiming at range queries; if the second frequency is greater than the first frequency and the difference between the second frequency and the first frequency exceeds the preset difference threshold, step 1023 is executed; otherwise, go to step 1024;

the second frequency is the query frequency corresponding to the second field, and the first frequency is the query frequency corresponding to the first field;

step 1023: determining the second field as a fragment field;

Step 1024: the first field is determined to be a fragment field.

In this embodiment, each target field that is a query field for an equivalent query and each target field that is a query field for a range query may be first determined among the target fields that are queried for data as query fields; in this way, the first field with the highest frequency can be determined in each target field which is the query field for the equivalent query, and the second field with the highest frequency can be determined in each target field which is the query field for the equivalent query.

Further, the query frequency corresponding to the first field and the magnitude relation of the query frequency corresponding to the second field can be determined. The query frequency corresponding to the first field may be used as the first frequency, and the query frequency corresponding to the second field may be used as the second frequency.

If the second frequency is greater than the first frequency and the difference between the second frequency and the first frequency exceeds a preset difference threshold, the second field can be determined as a fragment field; otherwise, the first field may be determined to be a fragment field.

In this way, when the second frequency is greater than the first frequency and the difference between the second frequency and the first frequency exceeds the preset difference threshold, the second field is determined to be a sliced field, so that the determined sliced field is matched with a query field for a specified query type, which is utilized when querying the target data table. Therefore, the probability that the fragment field of the target data table is used as the query field is higher, so that the probability that the storage node where the content to be queried is located is determined by the field value of the query field in the query process is improved, the occurrence probability of the cross-node query condition can be reduced, and the resource waste in the query process is reduced.

In other cases, determining the first field as a sliced field may enable the determined sliced field to be more matched with a query field for a specified query type utilized when querying the target data table, and may enable each storage node to store each piece of data in the target data table more evenly. Therefore, the probability of taking the fragment field of the target data table as the query field can be improved under the condition of balanced data storage, so that the probability of determining the storage node where the content to be queried is located in the query process through the field value of the query field is improved, the occurrence probability of the cross-node query condition can be reduced, and the resource waste in the query process is reduced.

For example, for the above table 4, for the target data table T2, the target fields that are the query fields for the equivalent query are F1 and F3, where F3 corresponds to the highest query frequency, F3 may be the first field, frequency 1000 corresponding to F3 may be the first frequency, and for the target data table T2, the target field that is the query field for the range query is F2, where F2 corresponds to the query frequency 5000, F2 may be the second field, and frequency 5000 corresponding to F2 may be the second frequency.

For example, assuming that the preset difference threshold is 5000, F3 may be taken as a fragmentation field for the target data table T2, since 5000-1000=4000 <5000, although 5000 is greater than 1000.

For example, assuming that the preset difference threshold is 3000, 5000 is greater than 1000, and since 5000-1000=4000 >3000, F2 may be taken as a fragmentation field for the target data table T2.

Optionally, in a third specific implementation manner, the specified query type may include a range query, and in this specific implementation manner, step S102, based on the determined query frequency, screens the fragment field from each target field according to a predetermined fragment field screening manner, may include the following step 1025:

step 1025: and determining a field with highest query frequency corresponding to the target field which is a query field aiming at the range query as a fragment field.

In this embodiment, among the determined target fields, a target field that is a query field for a range query may be determined, and further, a field with the highest query frequency, among the determined target fields that are query fields for a range query, may be used as a fragment field.

In this way, the field with the highest query frequency among the target fields as the query fields for the range value query is determined as the sharded field used when the target data table is re-sharded, so that the determined sharded field can be matched with the query field for the specified query type used when the target data table is queried. Therefore, the probability that the fragment field of the target data table is used as the query field is higher, so that the probability that the storage node where the content to be queried is located is determined by the field value of the query field in the query process is improved, the occurrence probability of the cross-node query condition can be reduced, and the resource waste in the query process is reduced.

For example, for table 4, F2 may be used as the fragment field if the target field, which is the query field for the range query, is F2 for the target data table T2.

Optionally, in a fourth specific implementation manner, the specified query type may include any query type, and step S102 of screening the fragment field from each target field according to a predetermined fragment field screening manner based on the determined query frequency may include the following step 1026:

Step 1026: and determining the target field with the highest query frequency corresponding to the target field as a fragment field.

In this embodiment, the highest query frequency may be determined in the query frequency corresponding to each target field as the query field for the specified query type, so that the target field with the highest determined query frequency may be used as the fragment field.

In this way, the field with the highest query frequency corresponding to each target field is determined as the fragment field utilized when the target data table is stored in a re-fragmentation manner, so that the determined fragment field can be matched with the query field for the designated query type utilized when the target data table is queried. Therefore, the probability that the fragment field of the target data table is used as the query field is higher, so that the probability that the storage node where the content to be queried is located is determined by the field value of the query field in the query process is improved, the occurrence probability of the cross-node query condition can be reduced, and the resource waste in the query process is reduced.

For example, for the table 4, for the target data table T1, if the target field corresponding to the highest query frequency is F2, then F2 may be used as the fragment field; for the target data table T2, if the target field corresponding to the highest query frequency is F2, then F2 may be used as the fragment field.

In addition, it can be understood that, in some cases, in order to enable each storage node to uniformly store each piece of data in the target data table, when the fragmented fields are screened, the query frequency corresponding to the fields is taken as a screening basis, and besides the query frequency corresponding to the fields is taken as a screening basis, the data equilibrium condition can be met as a screening basis.

Based on this, in an optional specific implementation manner, the method for selecting the fragmented field in step S102 may further include a method for selecting the field by taking the query frequency corresponding to the field as a selection basis, and taking the data equilibrium condition as a selection basis.

Wherein storing the equalization condition includes: when the data entry is used as a fragmentation field, the number of data entries in each sub-data table obtained by fragmentation can be enabled to meet a preset balance condition.

It can be understood that when the target data table is stored in a slicing manner, each piece of data in the target data table is stored in a plurality of storage nodes, and each piece of data belonging to the target data table stored in each storage node is a sub data table forming the target data table, and the data entries in the sub data table are: each piece of data belonging to the target data table is stored in the storage node where the sub data table is located.

The preset equalization condition in the storage equalization condition may be: in each sub data table obtained by slicing, the difference value of the number of data items in any two data tables is not more than a preset value; the method can also be as follows: the number of data items in each sub data table obtained by slicing is the same; it can also be: in each sub data table obtained by slicing, the difference value between the maximum data entry number and the minimum data entry number is not larger than a preset value; wherein the maximum number of data entries is: the number of data entries in the sub-data table with the largest data entry is the minimum number of data entries: the number of data entries in the sub-data table with the smallest data entry. Of course, the preset equalization condition may be other conditions, and the embodiment of the present invention is not limited specifically.

Alternatively, in order to enable the data equalization condition to be met as a screening basis when the fragment field screening is performed, in this embodiment, a statistics table as shown in table 3 may be constructed.

In an optional implementation manner, in step 1021 in the first embodiment, the determining, as the target field of the query field for the equivalent query, the field with the highest query frequency corresponding to the target field may include, as the fragment field, the following step 1:

step 1: and determining a field with highest query frequency in target fields which are used as query fields aiming at equivalent queries and meet data balance conditions, and taking the field with highest query frequency as a fragment field.

The target field which is the query field for the equivalent query may be determined first from the determined target fields, and then, the target field which meets the data balance condition may be determined again from the determined target fields which are the query fields for the equivalent query, so that the field with the highest query frequency corresponding to the target field obtained by the secondary determination may be determined as the fragment field.

Alternatively, the target field that is the query field for the equivalent query may be determined first from the determined target fields, and then, the field with the highest query frequency corresponding to the determined target field that is the query field for the equivalent query may be determined. Therefore, whether the determined field meets the data balance condition can be judged, and if so, the field is used as a fragment field; if not, determining the field with highest query frequency in the target field which is the query field aiming at the equivalent query and is not judged whether to meet the target field of the data balance condition, judging whether the determined field meets the data balance condition again, if yes, taking the field as a fragment field, and if not, returning to the step of determining the field with highest query frequency in the target field which is the query field aiming at the equivalent query and is not judged whether to meet the target field of the data balance condition.

If all the target fields as query fields for the equivalent query are judged to be not in accordance with the data balance condition, the corresponding field with the highest query frequency in the target fields as the query fields for the equivalent query can be directly used as the fragment field.

Or if the judging results of the continuous N times of judging are all not in accordance with the data equilibrium condition, the field with the highest corresponding query frequency in the target fields which are the query fields aiming at the equivalent query can be directly used as the fragment field. Wherein N is an integer greater than 1.

In another implementation manner, in the step 1022 in the second embodiment, the determining, according to the method of selecting the segment fields, the first field with the highest query frequency among the target fields as the query fields for the equivalent query and the second field with the highest query frequency among the target fields as the query fields for the range query may include the following step 2:

Step 2: and determining a first field with highest query frequency in all target fields which are used as query fields aiming at equivalent queries and meet data balance conditions, and a second field with highest query frequency in all target fields which are used as query fields aiming at range queries and meet data balance conditions.

The target field which is the query field for the equivalent query may be determined first from the determined target fields, and then, the target field which meets the data balance condition may be determined again from the determined target fields which are the query fields for the equivalent query, so that the field with the highest query frequency corresponding to the target field obtained by the second determination may be determined as the first field. Accordingly, the target field serving as the query field for the range query is determined from the determined target fields, and further, the target field meeting the data balance condition is determined again from the determined target fields serving as the query fields for the range query, so that the field with the highest query frequency corresponding to the target field obtained through the secondary determination can be determined as the second field.

Alternatively, the target field that is the query field for the equivalent query may be determined first from the determined target fields, and then, the field with the highest query frequency corresponding to the determined target field that is the query field for the equivalent query may be determined. Thus, whether the determined field meets the data balance condition can be judged, and if so, the field is taken as a first field; if not, determining the field with highest query frequency in the target field which is the query field aiming at the equivalent query and is not judged whether to meet the target field of the data balancing condition, judging whether the determined field meets the data balancing condition again, if yes, taking the field as the first field, and if not, returning to the step of determining the field with highest query frequency in the target field which is the query field aiming at the equivalent query and is not judged whether to meet the target field of the data balancing condition.

If all the target fields serving as the query fields aiming at the equivalent query do not meet the data balance condition, the corresponding field with the highest query frequency in the target fields serving as the query fields aiming at the equivalent query can be directly used as the first field.

Or if the judging results of the continuous N times of judging are all not in accordance with the data equilibrium condition, the field with the highest corresponding query frequency in the target fields which are the query fields aiming at the equivalent query can be directly used as the first field. Wherein N is an integer greater than 1.

Accordingly, the target field that is the query field for the range query may be determined first among the determined target fields, and then, the field with the highest query frequency corresponding to the determined target field that is the query field for the range query may be determined. Thus, whether the determined field meets the data balance condition can be judged, and if so, the field is taken as a second field; if not, determining the field with highest query frequency in the target field which is the query field aiming at the range query and is not judged whether to meet the target field of the data balance condition, judging whether the determined field meets the data balance condition again, if yes, taking the field as a second field, and if not, returning to the target field which is the query field aiming at the range query and is not judged whether to meet the target field of the data balance condition, and determining the field with highest query frequency.

If it is determined that all the target fields serving as query fields for range query do not meet the data balancing condition, the field with the highest query frequency corresponding to the target field serving as the query field for range query can be directly used as the first field.

Or if the judging results of the continuous N times of judging are all not in accordance with the data equilibrium condition, the field with the highest corresponding query frequency in the target fields which are the query fields aiming at the range query can be directly used as the first field. Wherein N is an integer greater than 1.

Corresponding to the data storage method provided by the embodiment of the invention, the embodiment of the invention also provides a data storage device.

Fig. 4 is a schematic structural diagram of a data storage device according to an embodiment of the present invention, and as shown in fig. 4, the device may include the following modules:

An information determining module 410, configured to determine, for a predetermined period of time, each target field in the target data table that is used as a query field for performing a data query, and each target field is used as a query frequency corresponding to a query field for specifying a query type;

a field screening module 420, configured to screen the fragment fields from the target fields according to a predetermined fragment field screening manner based on the determined query frequency; the fragment field screening mode comprises a mode of taking the corresponding query frequency when each target field is used as a query field aiming at a designated query type as a screening basis and carrying out field screening;

And the data storage module 430 is configured to store each piece of data belonging to the target data table in a sliced manner by using the slicing field.

Optionally, in a specific implementation manner, the designated query type includes: equivalent query; the field screening module 420 includes:

Optionally, in a specific implementation manner, the designated query type includes: equivalent query and range query; the field screening module 420 includes:

Optionally, in a specific implementation manner, the data storage module 430 is specifically configured to:

Corresponding to the data storage method provided in the above embodiment of the present invention, the embodiment of the present invention further provides an electronic device, as shown in fig. 5, including a processor 501, a communication interface 502, a memory 503, and a communication bus 504, where the processor 501, the communication interface 502, and the memory 503 complete communication with each other through the communication bus 504,

A memory 503 for storing a computer program;

The processor 501 is configured to implement the steps of any one of the data storage methods provided in the embodiments of the present invention when executing the program stored in the memory 503.

The communication bus mentioned above for the electronic device may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface is used for communication between the electronic device and other devices.

The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but may also be a digital signal processor (DIGITAL SIGNAL Processing, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.

In yet another embodiment of the present invention, there is also provided a computer readable storage medium having stored therein a computer program which, when executed by a processor, implements the steps of any of the data storage methods provided by the embodiments of the present invention described above.

In yet another embodiment of the present invention, a computer program product comprising instructions which, when run on a computer, cause the computer to perform the steps of any of the data storage methods provided by the embodiments of the present invention described above is also provided.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk Solid STATE DISK (SSD)), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the apparatus embodiments, the electronic device embodiments, the computer-readable storage medium embodiments, and the computer program product embodiments, the description is relatively simple, as relevant to the description of the method embodiments in part, since they are substantially similar to the method embodiments.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. A method of data storage, the method comprising:

Using the slicing field to store each piece of data belonging to the target data table in a slicing way;

The specified query types include: equivalent query and range query;

otherwise, determining the first field as a fragment field;

the screened fragment field is used for carrying out fragment storage on each piece of data belonging to the target data table again.

2. The method of claim 1, wherein the fragment field screening method further comprises a method of screening fields based on the data equalization condition;

3. The method of claim 2, wherein the step of determining a first field having a highest frequency of queries among the respective target fields that are query fields for equivalent queries, and a second field having a highest frequency of queries among the respective target fields that are query fields for range queries, comprises:

4. The method of claim 1, wherein the step of storing each piece of data belonging to the target data table in slices using the slice field comprises:

5. A data storage device, the device comprising:

The field screening module is used for screening the fragment fields from all target fields according to a preset fragment field screening mode based on the determined query frequency; the fragment field screening mode comprises a mode of taking the corresponding query frequency when each target field is used as a query field aiming at a designated query type as a screening basis and carrying out field screening;

The data storage module is used for storing each piece of data belonging to the target data table in a slicing way by utilizing the slicing field;

The specified query types include: equivalent query and range query;

The field screening module is specifically configured to:

otherwise, determining the first field as a fragment field;

6. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

A memory for storing a computer program;

A processor for implementing the method of any of claims 1-4 when executing a program stored on a memory.

7. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when executed by a processor, implements the method of any of claims 1-4.