CN114519055A

CN114519055A - Data storage method and device

Info

Publication number: CN114519055A
Application number: CN202210106999.2A
Authority: CN
Inventors: 胡建洪; 杨成虎; 张友东
Original assignee: Alibaba Cloud Computing Ltd
Current assignee: Alibaba Cloud Computing Ltd
Priority date: 2022-01-28
Filing date: 2022-01-28
Publication date: 2022-05-20

Abstract

The embodiment of the application provides a data storage method and device. In the embodiment of the application, the data characteristics of the first time partition corresponding to the timestamp of the time sequence data to be stored can reflect the time line base number of the time sequence data in the time range corresponding to the first time partition to a certain extent, so that the time window of the second time partition is adaptively adjusted according to the data characteristics of the first time partition, the time window of the second time partition is elastically expanded and contracted along with the data characteristics, and the problem of the high base number of the time line of the second time partition can be reduced. In this way, the problem of high cardinality of the time line does not exist in each time partition, and the index of each time partition is less, so that the index query efficiency of each time partition is higher when the time sequence data is queried.

Description

Data storage method and device

Technical Field

The present application relates to the field of data storage technologies, and in particular, to a data storage method and device.

Background

The time sequence data is a series of data continuously generated based on a certain frequency, and a great deal of time sequence data exists in the fields of Application Performance Monitoring (APM), internet of things, industrial internet and the like, and great challenges are provided for reading, writing and storage management of the time sequence data. Taking a vehicle network as an example, assuming that 20000 vehicles acquire 60 monitoring indexes every second, 1200000 data points are generated every second, and data of about 73.8GB is generated every hour, which brings a great challenge to the storage of time series data.

In the time series database, the tag portion of the time series data is used to construct the build timeline index. In a timeline expansion application scenario, a timeline high cardinality problem is generated, so that the timeline index is continuously expanded, and the time consumption of querying the index is increased during time series data retrieval.

Disclosure of Invention

Various aspects of the present application provide a data storage method and device, which are used to implement time partition storage for dynamically adjusting a time window, and can reduce the probability of occurrence of a high cardinality of a timeline, thereby contributing to improvement of subsequent data query efficiency.

An embodiment of the present application provides a data storage method, including:

acquiring time sequence data to be stored;

under the condition that a first time partition corresponding to the timestamp of the time sequence data to be stored exists in the created time partition, acquiring data characteristics of the time sequence data stored in the first time partition;

determining a target time window according to the data characteristics of the time sequence data stored in the first time partition;

according to the time stamp of the time sequence data to be stored, a second time partition with the target time window is created;

and storing the time sequence data to be stored in the second time partition.

An embodiment of the present application further provides a computing device, including: a memory and a processor; wherein the memory is used for storing a computer program; the memory creates a temporal partition;

the processor is coupled to the memory for executing the computer program for performing the steps in the above data storage method.

Embodiments of the present application also provide a computer-readable storage medium storing computer instructions, which, when executed by one or more processors, cause the one or more processors to perform the steps of the above data storage method.

The embodiment of the application also provides a computer program product, which comprises a computer program; the computer program, when executed by a processor, causes the processor to perform the steps of the data storage method described above.

In the embodiment of the application, on one hand, the data to be stored can be stored in the newly created second time partition instead of being directly stored in the first time partition corresponding to the timestamp of the time sequence data to be stored, which is beneficial to reducing the problem of high base number of the timeline in the first time partition; on the other hand, the data characteristics of the first time partition corresponding to the timestamp of the time sequence data to be stored can reflect the time line base number of the time sequence data in the time range corresponding to the first time partition to a certain extent, so that the time window of the second time partition is adaptively adjusted according to the data characteristics of the first time partition, the time window of the second time partition is elastically expanded and contracted along with the data characteristics, and the problem of the high base number of the time line of the second time partition can be reduced. In this way, the problem of high cardinality of the time line does not exist in each time partition, and the index of each time partition is less, so that the index query efficiency of each time partition is higher when the time sequence data is queried.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a schematic diagram of a data storage manner of a time-series database InfluxDB;

FIG. 2 is a schematic diagram of a slice group deletion strategy of a time-series database InfluxDB;

fig. 3 is a schematic diagram illustrating a data storage method of time series data according to an embodiment of the present application;

fig. 4 is a schematic diagram illustrating a deleting manner of a time partition according to an embodiment of the present application;

FIG. 5 is a graph illustrating a variation of write throughput with time according to an embodiment of the present disclosure;

fig. 6 is a schematic flowchart of a data storage method according to an embodiment of the present application;

FIG. 7 is a schematic view of a partitioned directory storage provided in an embodiment of the present application;

FIG. 8 is a schematic flow chart illustrating another data storage method according to an embodiment of the present application;

fig. 9 is a schematic flowchart of a data query method according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a computing device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

For the problem of high base number of the timeline in the existing time sequence data storage, in some embodiments of the present application, on one hand, data to be stored can be stored in a newly created second time partition instead of being directly stored in a first time partition corresponding to a timestamp of the time sequence data to be stored, which is beneficial to reducing the problem of high base number of the timeline in the first time partition; on the other hand, the data characteristics of the first time partition corresponding to the timestamp of the time sequence data to be stored can reflect the time line base number of the time sequence data in the time range corresponding to the first time partition to a certain extent, so that the time window of the second time partition is adaptively adjusted according to the data characteristics of the first time partition, the time window of the second time partition is elastically expanded and contracted along with the data characteristics, and the problem of the high base number of the time line of the second time partition can be reduced. In this way, the problem of high cardinality of the time line does not exist in each time partition, and the index of each time partition is less, so that the index query efficiency of each time partition is higher when the time sequence data is queried.

The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

It should be noted that: like reference numerals refer to like objects in the following figures and embodiments, and thus, once an object is defined in one figure or embodiment, further discussion thereof is not required in subsequent figures and embodiments.

The time series data refers to a series of data that are continuously generated based on a certain frequency. The timing data may include: tag (Tag), timestamp, Metric (Metric), and Metric value. The Tag (Tag) is composed of a Tag key (TagKey) and a corresponding Tag value (TagValue), and is used for representing the monitored object. The Metric (Metric) represents a monitoring index; the metric value represents a specific value of the monitoring index. The time stamp may represent the time at which the metric value was collected by the device. The information included in the time series data will now be described with reference to the time series data shown in table 1.

Table 1 example of time series data

Device	Region(s)	Time stamp	Temperature of	Humidity
					D0001	North China	2021-12-24T00：00:001	12.1	45
D0001	North China	2021-12-24T00：00:002	12.6	47
					……	……	……	……	……
D0002	South China	2021-12-24T00：00:001	13.5	43
					D0002	South China area	2021-12-24T00：00:002	13.6	42

In table 1, the labels are: a device + region; the label values are "device No. D0001 in the northward region" and "device No. D0002 in the south china", that is, the monitoring objects are "device No. D0001 in the north china" and "device No. D0002 in the south china". The measurement is as follows: temperature and humidity. The values at temperature and humidity are the measured values. Thus, table 1 may represent the temperature Time series (Time series) and humidity Time series (Time series) of device No. D0001 in northward region, and the temperature Time series and humidity Time series of device No. D0002 in south China. Wherein the time series may also be referred to as a timeline.

The time line or time series represents a series of data generated by the same index of the same object as a function of time. A time line is a series of data generated by one index of the same object changing with time. Accordingly, the time series data shown in table 1 includes: 4 time lines. Namely, the temperature time series and humidity time series of device No. D0001 in north china, and the temperature time series and humidity time series of device No. D0002 in south china.

A data point refers to each metric value collected at a particular time interval (successive time stamps) for some metric of the monitored object (defined by the metric and the tag) as one data point. I.e. the index value in each time line of data points. Wherein, an index value in each time line is a data point, and a data point can be defined by 'one measurement + N labels (N is more than or equal to 1) + one time stamp + one measurement value'. For example, Table 1 "Equipment No. D0001 in North China, 2021-12-24T 00: the temperature of 00:001 "was 12.1 ℃ which was 1 data point. In the embodiment of the present application, the time series data is divided into: key value data and domain value data. Wherein the key-value data includes: and the label value are used for representing the monitored object. The domain value data includes: time stamp and metric data of the time series data. The metric data includes a metric and a metric value.

The query for time series data generally needs to support multi-dimensional retrieval of tags (tags), i.e. the query for time series data is generally divided into two steps of index query and scanning of actual data (metric values). Wherein the speed of index query determines the efficiency of query. In some application scenarios, the timeline in the time series data may continue to increase over time. For example, in an APM scenario, the monitoring object is a process of an application, each process has a unique Process Identification (PID), and if the PID is used as a tag, when the process of the application is increased, the PID is also increased, resulting in an increase of the timeline. The time line of the time sequence data is continuously increased, the problem of high cardinality of the time line occurs, the index is continuously increased, and the index query efficiency is lower and lower. Because the time range can be queried in the query of the time series data, the stored time series data can be more and more along with the continuous writing of the time series data, and when the actual data (measurement value) is scanned, a large amount of invalid data can be read on a magnetic disk, so that the query efficiency is low.

In some schemes, such as sequential database infiluxdb, as shown in fig. 1, several retention policies (retention policies) and a sequence file (series file) may be set for the sequential database. Each retention policy contains several shard groups (shard groups) below. The retention policy is a policy for retaining time series data, and can be used for determining which data need not be retained, so that the data which need not be retained can be deleted according to the retention policy to release the disk space to accommodate subsequent newly written time series data. Under the default configuration, the time sequence database infiluxdb time slices according to the time range of 7 days, that is, the time range of the data retained by each slice group is 7 days. As shown in fig. 2, the shards under the retention policy are continually created and deleted over time. The dashed boxes in fig. 2 represent groups of fragments that are deleted according to the retention policy.

And the sequence file is used for storing the data of the key value part of the written time sequence data and allocating a unique identifier for each time line key value, and the identifier can be used for constructing an inverted index supporting multi-dimensional retrieval.

Although the time sequence database InfluxDB stores the domain value part of the time sequence data in a time division manner, the problem of high base number of the domain value data can be solved to a certain extent. However, the time-series data storage method of the time-series database infiluxdb still has the following problems:

1. timeline high cardinality problem: in a time line expansion type application scenario, the key value data stored in the sequence file is expanded, so that the index is increased continuously, and the index query efficiency is reduced.

2. Data expiration problem: in the time sequence data storage mode of infiluxdb, the key value data is not time-partitioned together with the domain value data, and when the domain value data is deleted at the end of life, the key value data that has expired cannot be deleted friendly.

3. Time window adjustment problem for time partitions: the time window of the time partition is not adaptively adjusted according to the data characteristics, and the time window of the time partition which is manually adjusted cannot be immediately effective, and the time window can be effective only after the last time of time partition storage is finished.

In view of the above technical problems, an embodiment of the present application provides a new time series data storage method. As shown in fig. 3, the embodiment of the present application may time-partition the key value data and the domain value data of the time series data together. Specifically, in the embodiment of the present application, the time-partitioned storage may be performed on the time-partitioned data. The time partition performs data slicing with a time window (interval). The time window may also be referred to as a time interval. Each time partition is used for storing time sequence data of the time section corresponding to the time partition. As shown in fig. 3, the database may include: a plurality of time partitions. Plural means 2 or more. In the embodiment of the present application, the time series data may be divided into key value data and domain value data; time partitioning is carried out on the key value data and the domain value data of the time sequence data; and storing the time sequence data in a time partition of which the time range contains the time stamp of the time sequence data according to the time stamp of the time sequence data, namely storing the key value data and the domain value data of the time sequence data in the time partition of which the time range contains the time stamp of the time sequence data.

The time sequence data storage mode provided by the embodiment of the application can store the key value data and the domain value data of the time sequence data in the same time partition. Because the key value data are independent and do not influence each other in each time partition, the problem that the time line IDs corresponding to the same time line in each time partition are different in a time line expansion type application scene is solved, and therefore each time partition can maintain respective index, the size of the index can be reduced, and the subsequent index query efficiency can be improved.

In the embodiment of the application, since the key value data and the domain value data of the time sequence data are stored in the same time partition, when the time sequence data of a certain time partition is not in the data retention period, the time sequence data of the whole time partition can be deleted from the disk. The data deletion diagram is shown in fig. 4, and for a time partition that does not need to be reserved, the time sequence data of the time partition can be deleted from the disk. The time partitions in fig. 4 that do not need to be reserved are: the time ranges are two time divisions 20210705-20210711 and 20210712-20210718.

The time sequence data storage mode for storing the key value data and the domain value data of the time sequence data in the same time partition can solve the problem of high cardinality of the time data to a certain extent. However, in a practical application scenario, as shown in fig. 5, the writing throughput of the database system is not fixed. The writing throughput of the database system changes along with the change of time, and certain peaks and troughs exist. The write throughput is represented in fig. 5 as tps. Therefore, when time-series data is time-series partitioned and stored in the time dimension, the amount of data stored in the time partitions in different time ranges is different, and more time-series data is written in the time partition in the peak time period and less time-series data is written in the time partition in the valley time period.

For the time line expansion type application scenario, although the time sequence data is stored in a time partition mode in a fragmentation mode, the problem of high base number of the global time line is avoided, if high-throughput writing is carried out in a short time at the peak stage of writing throughput of the database, the problem of high base number of the time line in a single time partition is still possibly caused. For example, in an APM scenario, the PID increases over time. For another example, in a container status monitoring scenario, during peak visits, the number of containers may increase dramatically, as may the container identification. In the valley period, due to the low write throughput, the data in a plurality of time partitions is less, resulting in the time series data storage being more dispersed. Thus, when time series data is queried, the time consumption of querying is increased due to the fact that time series data is stored discretely.

In order to solve the above technical problem, an embodiment of the present application provides a data storage method for dynamically adjusting the size of a time window of a time partition, so as to solve the above problem of high cardinality of a timeline of a single time partition. The following description is given by way of example with reference to specific embodiments.

Fig. 6 is a schematic flowchart of a data storage method according to an embodiment of the present application. As shown in fig. 6, the data storage method includes:

601. and acquiring time sequence data to be stored.

602. And acquiring the data characteristics of the time sequence data stored in the first time partition under the condition that the created time partition has the first time partition corresponding to the time stamp of the time sequence data to be stored.

603. And determining a target time window according to the data characteristics of the time sequence data stored in the first time partition.

604. And creating a second time partition with a target time window according to the time stamp of the time sequence data to be stored.

605. And storing the time sequence data to be stored in the second time partition.

In the embodiment of the application, the execution body of the data storage method. Alternatively, the executing agent of the data storage method may be a logical node that manages the database. The logical node can be deployed in the device where the database is located, and can also be deployed in other devices.

In the embodiment of the present application, the time series data to be stored is time series data received by a device executing the data storage method, and the time series data is not written into the database yet. For the description of the time series data, reference may be made to the relevant contents of the above embodiments, which are not described herein again.

For the time sequence data to be stored acquired in step 601, time partition storage may be performed on the time sequence data to be stored, and the time sequence data to be stored may be stored in a time partition whose time range includes a timestamp of the time sequence data to be stored. The time range of the time partition refers to the starting and ending time corresponding to the time partition, that is, the time partition is used for storing the time sequence data of the time stamp in the time range.

Based on this, in the embodiment of the present application, it may be determined whether the created time partition has a time partition corresponding to the timestamp of the time series data to be stored according to the timestamp of the time series data to be stored and the time range corresponding to the created time partition. The time partition corresponding to the timestamp of the time sequence data to be stored refers to a time partition of which the time range includes the timestamp of the time sequence data to be stored.

In some embodiments, the time range of the time partition may be represented by a start time and an end time. Accordingly, it may be determined whether the timestamp of the time series data to be stored is included in the time range of the created time partition; if the judgment result is yes, determining that the created time partition has a time partition corresponding to the time stamp of the time sequence data to be stored; and the time range comprises the time partition of the time stamp of the time sequence data to be stored, and the time partition is determined as the time partition corresponding to the time stamp of the time sequence data to be stored. Correspondingly, if the time stamp of the time sequence data to be stored is not included in the time range of the created time partition, it is determined that the time partition corresponding to the time stamp of the time sequence data to be stored does not exist in the created time partition.

In the embodiment of the present application, a partition identifier may be set for a time partition. The partition identifier refers to information that can uniquely represent a time partition. In some embodiments of the present application, a partition identification of a timestamp of time-series data stored by a time partition may be characterized. For example, the partition identification of the time partition may be set in the following manner:

Partition Id＝timestamp/interval (1)。

in the formula (1), timestamp is a time stamp of the time sequence data, interval is a time window size of the time Partition, and Partition Id represents a Partition identifier of the time Partition. If the time windows are the same, the calculated Partition identification Partition Id may uniquely identify a time Partition. When the time windows are different, the time sequence data of the two different time windows may be the same by using the determined partition identifier. Therefore, a binary set of (Partition Id, interval) may be used to uniquely identify a time Partition. Therefore, in the embodiment of the present application, a binary group of (Partition Id, interval) may be used as the Partition identifier. The time range of the time Partition available based on the above formula is [ Partition Id × interval, (Partition Id +1) × interval), and this time range may be left open and right closed.

Based on the partition identifier, when it is determined whether the created time partition has a time partition corresponding to the time stamp of the time sequence data to be stored, the target partition identifier corresponding to the time sequence data to be stored can be determined according to the time stamp of the time sequence data to be stored and the currently recorded time window. Wherein the currently recorded time window may be the time window of the latest established time partition of the created time partitions. Based on the formula (1), the time stamp of the time sequence data to be stored can be divided by the currently recorded time window to obtain a Partition identifier Partition Id; and determining a binary group consisting of the obtained partition identification and the currently recorded time window as a target partition identification.

Further, the target partition identification may be looked up in the partition identifications of the created time partitions. The created time Partition can also be represented by a binary group consisting of a Partition Id calculated by a timestamp of the created time Partition and a corresponding time window. And if the target partition identification is found in the partition identification of the created time partition, determining that the time partition corresponding to the timestamp of the time sequence data to be stored exists in the created time partition. Correspondingly, if the target partition identification is not found in the partition identification of the created time partition, it is determined that the time partition corresponding to the time stamp of the time sequence data to be stored does not exist in the created time partition.

In the embodiment of the application, for the created time partition, a time partition corresponding to a timestamp of time series data to be stored exists, and since the number of time lines of the time series data coming in the future of the time partition cannot be predicted, the data characteristics of the time series data stored in the time partition can reflect the existing time line condition of the time partition. Therefore, in order to prevent the problem of high cardinality of the time line within a single time partition, in step 602, the data characteristics of the time partition corresponding to the time stamp of the time series data to be stored in the created time partition may be created. The data characteristics of the time partition refer to characteristics that can reflect the data size of the time partition, and may include: the number of timelines and the number of data points that a temporal partition contains. For the explanation of the timeline and the data points, reference may be made to the relevant contents of the above embodiments, and details are not described here.

Further, in step 603, a target time window may be determined according to data characteristics of a time partition corresponding to a timestamp of time series data to be stored in the created time partition. In the embodiments of the present application, the time window refers to the size of the time window. For example, the time window may be 1 hour, 1 day, one week, etc. The target time window is a time window of a time partition to be created, and the size of the time window can be adaptively adjusted according to the data characteristics of the time partition corresponding to the timestamp of the time sequence data to be stored in the created time partition.

Further, in step 604, a time partition having a target time window may be created based on the time stamp of the time series data to be stored. For convenience of description and distinction, in the embodiment of the present application, a time partition corresponding to a timestamp of time series data to be stored, which exists according to a created time partition, is defined as a first time partition; and defining the time partition with the target time window at the creation as a second time partition. Further, in step 605, the timing data to be stored may be stored in a second time partition.

In this embodiment, on one hand, the data to be stored may be stored in the newly created second time partition, instead of being directly stored in the first time partition corresponding to the timestamp of the time series data to be stored, which is helpful for reducing the problem of high base number of the timeline in the first time partition; on the other hand, the data characteristics of the first time partition corresponding to the timestamp of the time sequence data to be stored can reflect the time line base number of the time sequence data in the time range corresponding to the first time partition to a certain extent, so that the time window of the second time partition is adaptively adjusted according to the data characteristics of the first time partition, the time window of the second time partition is elastically expanded and contracted along with the data characteristics, and the problem of the high base number of the time line of the second time partition can be reduced. In this way, the problem of high cardinality of the time line does not exist in each time partition, and the index of each time partition is less, so that the index query efficiency of each time partition is higher when the time sequence data is queried.

In the embodiment of the present application, a specific implementation form of the data feature of the time partition is not limited. In some embodiments, the data characteristics of the time-series data stored in the time partition may be characterized by the number of time lines and the number of data points included in the time-series data stored in the time partition. The more the number of the timelines is, the higher the base number of the timelines is, and the larger the data volume of the time partition is; the larger the number of data points, the larger the amount of data for the time partition. Accordingly, when the data characteristics of the first time partition corresponding to the timestamp of the time series data to be stored are acquired, the number of timelines and the number of data points included in the time series data stored in the first time partition can be acquired as the data characteristics of the time series data stored in the first time partition. Accordingly, the target time window may be determined according to the number of timelines and the number of data points included in the time series data stored in the first time partition.

In the embodiment of the present application, the number of timelines (maxSeries) that a time partition can contain at most, that is, an upper limit of the number of timelines, may be preset. The upper limit of the time line quantity can be flexibly set according to the actual query efficiency requirement. If the number of the time lines contained in one time partition is larger than the upper limit (maxSeries) of the number of the time lines, the problem that the time partition has a high base number of the time lines is shown. Of course, the number of data points (minPoints) that a time partition contains at least, i.e., the lower limit of the number of data points, may also be preset. If the number of data points contained in a time partition is less than the lower limit of the number of data points (minPoints), it indicates that the data amount stored in the time partition is too small, and the time sequence data storage is relatively dispersed. When time series data is queried, the time consumption of data query is long due to the fact that time series data is stored dispersedly.

In the embodiment of the present application, the lower limit of the number of data points can be determined by the upper limit of the number of time lines (maxSeries) and the coefficient of data points (minPointIndex) which is set to be contained in one time partition at least. The minimum data point coefficient (minPointIndex) is the number of data points contained in a data line. Accordingly, the lower limit of the number of data points (minpoits) for a time partition may be equal to the product of the upper limit of the number of time lines (maxSeries) and the coefficient of data points (minPointIndex) that a time partition contains the least, i.e., minPoints.

In order to prevent the problem that the base number of the time lines of a single time partition is too high and the problem that the query efficiency is low due to too small data amount of the time partition is prevented, based on the upper limit of the time line number and the lower limit of the data point number corresponding to one time partition, when the time line number and the data point number contained in the time sequence data stored in the first time partition corresponding to the timestamp of the data to be stored are less than or equal to the set upper limit of the time line number, whether the time line number contained in the time sequence data stored in the first time partition is less than or equal to the set upper limit of the time line number can be judged; and judging whether the number of data points contained in the time sequence data stored in the first time partition is less than or equal to the set lower limit of the number of data points. The time sequence data stored in the first time partition contains a time line number which is less than or equal to the set upper limit of the time line number, which shows that the first time partition has no problem of high base number of the time line. The time sequence data stored in the first time partition contains data points with the number less than or equal to the set lower limit of the number of the data points, which indicates that the data amount of the first time partition is less. Therefore, if the determination result is: the time window of the second time partition can be increased appropriately when the time sequence data stored in the first time partition contains a time line number less than or equal to the set upper time line number limit and/or the time sequence data stored in the first time partition contains a data point number less than or equal to the set lower data point number limit. Accordingly, the time window of the first time partition may be increased as the target time window. For example, with day as the time granularity, the time window of the first time partition may be incremented by 1 day as the target time window, and so on.

The data quantity contained in the time sequence data stored in the first time partition is larger than the set lower limit of the data quantity, which shows that the data quantity stored in the first time partition meets the requirement, the time sequence data stored in the first time partition is relatively concentrated, and the problem of low retrieval efficiency caused by dispersed time sequence data storage does not exist to a certain extent during time sequence data retrieval. The time sequence data stored in the first time partition comprises a time line number which is greater than the set upper limit of the time line number, which shows that the time line number of the first time partition is more, and if the second time partition still uses the time window of the first time partition, the problem of high base number of the time line can also occur in the second time partition. Based on this, if the judgment result is: the time window of the first time partition can be reduced to be used as the target time window. For example, for the case of day time granularity, the time window of the first time partition may be narrowed by 1 day as the target time window, and so on.

In the embodiment of the application, in order to prevent the data amount of one time partition from being too large, a plurality of sub-partitions may be provided for one time partition, and each sub-partition is used for storing partial time sequence data in a corresponding time range of the time partition. Plural means 2 or more. The sub-partitions of a time partition may have the same time window as the time partition, and may also have the same partition identification. A plurality of sub-partitions under one time partition have the same time window and the same time identification, and are distinguished by the sub-partition identifications. The sub-partition identifier may be a number corresponding to the order of the sub-partition that is the next sub-partition under the time partition. A partition identification + sub-partition identification may represent only one sub-partition. For example, assume that the partition identification of the time partition is: 123-7, the time window of the time partition is 7 days, and the PartitionId is 123. For the identification: 123-7-2, representing the 2 nd sub-partition of the time partition identified as 123-7.

And on the basis of the sub-partitions corresponding to the time partitions, at least one sub-partition already exists in the first time partition corresponding to the timestamp of the time sequence data to be stored. When the target time window is determined according to the number of the time lines and the number of the data points included in the time sequence data stored in the first time partition corresponding to the timestamp of the time sequence data to be stored, the target time window may be determined according to the number of the time lines and the number of the data points included in the time sequence data stored in the sub-partition created the latest among the at least one sub-partition included in the first time partition. For convenience of description, the latest sub-partition created from among at least one sub-partition included in the first time partition is defined as the first sub-partition.

For the sub-partition corresponding to the time partition, the maximum number of timelines (maxSeries) that can be included in one sub-partition, i.e., the upper limit of the number of timelines, may also be set. Of course, the number of data points (minPoints) contained in a sub-partition at least can be preset, i.e. the lower limit of the number of data points. For the description of the upper limit of the number of timelines and the lower limit of the number of data points, reference may be made to the relevant contents of the above embodiments, which are not described herein again.

Based on the upper limit of the number of the timelines and the lower limit of the number of the data points of the sub-partitions, when a target time window is determined according to the number of the timelines and the number of the data points contained in the latest time sequence data stored by the first sub-partition, whether the number of the timelines contained in the time sequence data stored by the first sub-partition is smaller than or equal to the set upper limit of the number of the timelines can be judged; and judging whether the number of data points contained in the time sequence data stored in the first sub-partition is less than or equal to a set lower limit of the number of data points.

The time sequence data stored in the first sub-partition comprises a time line number which is less than or equal to the set upper limit of the time line number, and the problem that the time line high cardinality does not exist in the first sub-partition is shown. The time sequence data stored in the first sub-partition comprises data points with the number less than or equal to the set lower limit of the number of the data points, which indicates that the data amount of the first time partition is less. Therefore, in some embodiments, if the determination result is: the time window of the second time partition can be increased appropriately when the time sequence data stored in the first sub-partition contains a time line number less than or equal to the set upper time line number limit and/or the time sequence data stored in the first sub-partition contains a data point number less than or equal to the set lower data point number limit. Accordingly, the time window of the first sub-partition may be increased as the target time window.

Alternatively, the time window of the first sub-partition may be increased as the target time window in a case where the time-series data stored in the first sub-partition includes a number of timelines less than or equal to the set upper limit of the number of timelines, and the time-series data stored in the first sub-partition includes a number of data points less than or equal to the set lower limit of the number of data points.

In other embodiments, in consideration that the number of timelines and the number of data points of a single sub-partition cannot accurately reflect the data characteristics in the time range of the time partition, in order to improve the accuracy of the time window adjustment, in the case that the number of timelines included in the time sequence data stored in the first sub-partition is less than or equal to the set upper limit of timeline number, and the number of data points included in the time sequence data stored in the first sub-partition is less than or equal to the set lower limit of data point number, the sum of the numbers of timelines (sumSeries) and the number of data points (sumPoints) included in a plurality of consecutive sub-partitions under the time partition may be used to determine whether the time window of the time partition needs to be adjusted. Specifically, for a first time partition corresponding to a timestamp of time series data to be stored, a plurality of sub-partitions may be obtained from the first time partition. Plural means 2 or more. For example, the plurality of sub-partitions may be sequentially obtained from the sub-partitions included in the first time partition in order from late to early according to the creation time of the sub-partitions included in the first time partition. For example, the number of the plurality is 2, and two consecutive sub-partitions which create the latest are obtained from the sub-partitions of the first time partition. The two consecutive sub-partitions include: the first sub-partition and another sub-partition closest to the creation time interval of the first sub-partition.

Further, a sum of the number of timelines (sumdocuments) contained in the time-series data stored by the plurality of sub-partitions, and a sum of the number of data points (sumPoints) contained in the time-series data stored by the plurality of sub-partitions may be determined. Further, whether the sum (summeries) of the time line quantities contained in the time sequence data stored in the plurality of sub-partitions is smaller than or equal to a set upper limit of the time line quantities can be judged; and judging whether the sum (summpoints) of the data points contained in the time sequence data stored in the plurality of sub-partitions is less than or equal to a set lower limit of the data point number.

The time sequence data stored by the plurality of sub-partitions contains a time line quantity which is less than or equal to the set upper limit of the time line quantity, so that the problem of high cardinality of the time lines does not exist in the plurality of sub-partitions. The data point quantity contained in the time sequence data stored in the plurality of sub-partitions is less than or equal to the set lower limit of the data point quantity, which indicates that the data quantity of the plurality of sub-partitions is less. Therefore, if the time window of the second time partition still maintains the time window of the first sub-partition in this case, the time-series data partitions may be stored too discretely, and subsequent time-series data retrieval needs to span multiple partitions, resulting in low retrieval efficiency. Therefore, if the determination result is: the sum of the number of timelines contained in the time sequence data stored in the plurality of sub-partitions is less than or equal to the set upper limit of the number of timelines, and/or the sum of the number of data points contained in the time sequence data stored in the plurality of sub-partitions is less than or equal to the set lower limit of the number of data points, so that the time window of the second time partition can be increased appropriately. Accordingly, the time window of the first sub-partition may be increased as the target time window. For example, the time window of the first sub-partition is 7 days, and the time window may be increased by 1 day, i.e. 8 days is used as the target time window.

In the embodiment of the present application, in order to prevent the unlimited increase of the time window, a maximum time window may be further set. The maximum time window can be flexibly set according to the actual application requirements. For example, a maximum time window may be set to 1 week, 1 month, 2 months, and so forth. In the above embodiment, in the embodiment that the time window of the first sub-partition is increased to obtain the target time window, it may be further determined whether the time window of the first sub-partition is greater than or equal to the maximum time window, and if the time window of the first sub-partition is smaller than the maximum time window, the time window of the first sub-partition is increased to obtain the target time window. And if the time window of the first sub-partition is larger than or equal to the maximum time window, taking the time window of the first sub-partition as a target time window of a second time partition to be created.

Further, a second time partition having a target time window may be created based on the time stamp of the time series data to be stored. Specifically, the start time of the second time partition may be determined according to a timestamp of the time series data to be stored. In some embodiments, the timestamp of the timing data to be stored may be the start time of the second time partition. Or, the timestamp of the time sequence data to be stored can be accurate to the set time accuracy, so as to obtain the starting time of the second time partition. For example, the set time precision is seconds; the time stamp of the time series data to be stored is accurate to microseconds, and the time stamp of the time series data to be stored can be accurate to seconds as the starting time of the second time partition, and so on. After determining the start time of the second time partition, a second time partition may be created with the determined start time to start time + the target time window (i.e., the increased time window) as a time range, and the target time window as a time window. After the second time partition is determined, the timing data to be stored may be stored in the second time partition.

In the embodiment of the present application, as shown in fig. 7, time series data stored in a time partition needs to be persistently stored in a computer storage medium. The computer storage medium may be a floppy disk, an optical disk, a DVD, a magnetic disk, a hard disk, a flash Memory, a U-disk, a CF card, an SD card, an MMC card, an SM card, a Memory Stick (Memory Stick), an xD card, or the like. Fig. 7 is a diagram showing only a storage medium as a magnetic disk, but does not constitute a modern system. Thus, there is a need to name the directories that the time partition stores on the storage medium. In this embodiment, one time partition corresponds to one partition directory; the sub-partitions in the temporal partition are represented by sub-partition identifications. For example, the partition directory of the time partition may be represented as: partitionId _ interval _ level (partition identification-time window-child partition number).

Accordingly, for the second time partition, the partition identification of the second time partition may be determined according to the timestamp of the time series data to be stored and the target time window. For a specific determination process, reference may be made to an implementation of determining a target partition identifier corresponding to time series data to be stored, which is not described herein again.

Further, a partition identification and a partition directory name of the time window characterizing the second time partition may be determined as the partition directory name of the second time partition; and stores the partition directory name of the second time partition. Therefore, when time sequence data is inquired, the time range of the time partition can be determined according to the partition identification and the time window in the partition directory name of the time partition; and determining the time partition meeting the query time range according to the query time range contained in the query request. On the other hand, the time window of the time partition is written into the partition directory name, the time window of the time partition can be recorded, and when the time window of the time partition is dynamically adjusted, the currently recorded time window can be conveniently obtained.

Of course, the newly created second time partition may be regarded as the first sub-partition of the second time partition. Accordingly, a partition directory name for characterizing the partition identification, the time window, and the sub-partition identification of the second time partition may be determined as the partition directory name of the second time partition. For example, the partition directory name of the second time partition may be 123_7_ 0.

In the embodiment of the present application, in order to increase the readability of the partition directory name and reduce the operation and maintenance complexity, the partition identification partitionId may be subjected to date formatting. For example, assume that the timestamp is timestamp 1629820800; the timestamp precision is second, and if the date corresponding to the timestamp is 2021, 8, 25, 00 seconds, the time window interval of the time partition is 86400 seconds, that is, 1 day, the time partition identification partitionId is 1629820800/86400, 18863.

If the partition identifier and the time window are not formatted, the partition directory name is: 18863_86400_0_1, which is not only less readable, but also longer partition directory names. Pathnames after formatting using the day value are: 20210825_1_0_1, where 18863 is formatted to 20210825 and 86400 is formatted to 1, i.e., day granularity. Therefore, the length of the partition names is reduced, readability is good, and enough effective information is contained.

In the embodiment of the application, the storage engine can be used for persistently storing the time sequence data to be stored to the storage space corresponding to the second time partition. Due to the different data structures of the stored data of the storage engines of different versions, the storage engines cannot support the processing of the time sequence data stored by other storage engines. Therefore, to facilitate the sequential data query, the version number of the storage engine may also be written in the partition directory name of the time partition. For example, the partition directory for a temporal partition may be represented as: partitionId _ interval _ level _ version (partition identification-time window-child partition number-version number).

Wherein, version number (version) represents the version number of the storage engine storing the time series data to be stored. In this way, when the time sequence data is inquired, the version number in the partition directory name of the time partition can be used for determining the target storage engine for storing the time sequence data of the time partition; and processing the target time sequence data meeting the query conditions by using the target storage engine to obtain a query result.

In another embodiment, when the time-series data stored in the first sub-partition contains a number of timelines smaller than or equal to a set upper limit of the number of timelines, and/or the time-series data stored in the first sub-partition contains a number of data points smaller than or equal to a set lower limit of the number of data points, the time-series data stored in the plurality of sub-partitions contains a number of data points larger than the set lower limit of the number of data points, which indicates that the data amount stored in the plurality of sub-partitions meets the requirement, and in the time-series data storage comparison set stored in the plurality of sub-partitions, the problem of low retrieval efficiency caused by the dispersion of the time-series data storage does not exist to a certain extent during the retrieval of the time-series data. The time sequence data stored in the plurality of sub-partitions comprises a time line number which is larger than the set upper limit of the time line number, which shows that the time line number of the plurality of sub-partitions is larger. And the time sequence data stored in the first sub-partition contains the time line quantity which is less than or equal to the set upper limit of the time line quantity, which shows that the first sub-partition does not have the problem of high base number of the time line. The time sequence data stored in the first sub-partition comprises data points with the number less than or equal to the set lower limit of the number of the data points, which indicates that the data volume of the first time partition is less. Based on this, when the number of time lines included in the time series data stored in the first sub-partition is less than or equal to the set upper limit of the number of time lines, and/or the number of data points included in the time series data stored in the first sub-partition is less than or equal to the set lower limit of the number of data points, if the determination result is: the time sequence data stored in the plurality of sub-partitions comprises a time line number which is greater than a set upper limit of the time line number, and the time sequence data stored in the first sub-partition comprises a data point number which is greater than a set lower limit of the data point number, so that the time sequence data to be stored can be stored in the first sub-partition.

The data quantity stored in the first sub-partition is larger than the set lower limit of the data quantity, which indicates that the data quantity stored in the first sub-partition meets the requirement, the time series data stored in the first sub-partition is relatively concentrated, and the problem of low retrieval efficiency caused by dispersed time series data storage does not exist to a certain extent during time series data retrieval. The time sequence data stored in the first sub-partition contains a larger number of timelines than the set upper limit of the number of timelines, which indicates that the number of timelines of the first sub-partition is larger, and if the second time partition still uses the time window of the first sub-partition, the first sub-partition also has the problem of high base number of timelines. Based on this, if the judgment result is: the time window of the first sub-partition can be reduced to be used as the target time window. For example, for the case of day time granularity, the time window of the first sub-partition may be narrowed by 1 day as the target time window, and so on.

Further, a second time partition having a target time window may be created based on the time stamp of the time series data to be stored. For a specific implementation of creating the second time partition, reference may be made to relevant contents of the foregoing embodiments, which are not described herein again.

In some embodiments, in order to prevent the time window of the time partition from being frequently adjusted to affect the system performance, in the case that the time-series data stored in the first sub-partition includes a number of timelines greater than the set upper limit of the number of timelines, and the time-series data stored in the first sub-partition includes a number of data points greater than the set lower limit of the number of data points, the time window size may not be adjusted, and a new sub-partition may be created in the first time partition. In the embodiment of the present application, for convenience of description and distinction, a newly created child partition is defined as a second child partition.

In order to prevent the data amount of the time series data stored in the same time partition from being too large, the upper limit of the number of sub-partitions included in the time partition may be preset. A time partition preferably contains a number of sub-partitions that does not exceed a set upper limit for the number of sub-partitions. In the embodiment of the application, a specific value of the upper limit of the number of the sub-partitions included in one time partition is not limited, and can be flexibly set according to actual requirements. Wherein the upper limit of the number of sub-partitions is an integer greater than or equal to 2. For example, the upper limit on the number of sub-partitions may be 3, 4, 5, and so on.

In order to take system performance and the data size of the time partition into consideration, under the condition that the time sequence data stored in the first sub-partition contains a time line quantity greater than a set time line quantity upper limit and the time sequence data stored in the first sub-partition contains a data point quantity greater than a set data point quantity lower limit, whether the quantity of established sub-partitions in the first time partition is greater than or equal to the set sub-partition quantity upper limit or not can be judged; if the judgment result is yes, the time window of the first sub-partition can be reduced to be used as the target time window. For example, for the case of day time granularity, the time window of the first sub-partition may be narrowed by 1 day as the target time window, and so on.

Further, a second time partition having a target time window may be created based on the time stamp of the time series data to be stored. For a specific implementation of creating the second time partition, reference may be made to relevant contents of the foregoing embodiments, which are not described herein again. For the embodiment of naming the partition directory of the second time partition, reference may also be made to the related contents of the foregoing embodiments, and details are not described here again.

Correspondingly, if the determination result is that the number of the established sub-partitions in the first time partition is smaller than the set upper limit of the number of the sub-partitions, the time window may be kept unchanged, and another sub-partition (i.e., the second sub-partition) in the first time partition may be created. In particular, the time window of the first sub-partition may be determined to be the target time window. Thereafter, a second sub-partition having the target time window may be created under the first time partition as a second time partition according to the timestamp of the data to be stored.

Specifically, the starting time of the second time partition may be determined according to a timestamp of the time series data to be stored; and creating a second sub-partition of the first time partition as a second time partition according to the starting time of the second time partition and the time window of the first sub-partition. Further, the data to be stored may be stored to the second sub-partition.

For sub-partitions under a time partition, each sub-partition also corresponds to a partition directory name. In this embodiment, one time partition corresponds to one partition directory; the sub-partitions in the temporal partition are represented by sub-partition identifications. For example, the partition directory of a child partition may be represented as: partitionId _ interval _ level (partition identification-time window-child partition number).

Accordingly, for a second sub-partition, a partition identification of the second sub-partition may be determined to be a partition identification of the first time partition; and determining the sub-partition identification of the second sub-partition according to the number of the sub-partitions corresponding to the first time partition. Wherein the sub-partition identification of the second sub-partition may be the sequential number of the second sub-partition in the first time partition. Further, a partition directory name for representing a partition identifier, a time window and a sub-partition identifier of the second sub-partition may be determined as the partition directory name of the second sub-partition; and stores the partition directory name of the second sub-partition. Of course, the newly created second time partition may be regarded as the first sub-partition of the second time partition.

In the embodiment of the application, the storage engine may be used to persistently store the time series data to be stored to the storage space corresponding to the second sub-partition. Therefore, to facilitate sequential data queries, the version number of the storage engine may also be written in the partition directory name of the child partition. For example, the partition directory of a child partition may be represented as: partitionId _ interval _ level _ version (partition identification-time window-child partition number-version number). The function of the version number can be referred to the relevant content of the above embodiments, and is not described herein again.

In the embodiment of the present application, for the case that the number of data points included in the time-series data stored in the first sub-partition is greater than the set lower limit of the number of data points, but the number of timelines included in the time-series data stored in the first sub-partition is less than or equal to the set upper limit of the number of timelines, it is described that the amount of data stored in the first sub-partition meets the requirement, the time-series data stored in the first sub-partition is relatively concentrated, and when retrieving the time-series data, the problem of low retrieval efficiency due to the dispersion of the time-series data storage does not exist to a certain extent; and the time sequence data stored in the first sub-partition contains the time line quantity which is less than or equal to the set upper limit of the time line quantity, which shows that the first sub-partition does not have the problem of high base number of the time line. In order to prevent frequently creating new time partitions or sub-partitions, which affects system performance, in the above case, the time series data to be stored may be stored to the first sub-partition.

For the same reason, for the case that the number of data points included in the time-series data stored in the first sub-partition is less than or equal to the set lower limit of the number of data points, but the number of timelines included in the time-series data stored in the first sub-partition is greater than the set upper limit of the number of timelines, the time-series data to be stored may also be stored in the first sub-partition.

In this embodiment of the present application, for an embodiment in which there is no time partition corresponding to the timestamp of the time series data to be stored for the created time partition, another time partition may be created according to the currently recorded time window and the timestamp of the time series data to be stored. In the embodiment of the present application, for convenience of description and distinction, in a case where there is no time partition corresponding to a timestamp of time series data to be stored in an already created time partition, the created time partition is defined as a third time partition. For a specific implementation of creating the third time partition, reference may be made to the related contents of creating the second time partition, which are not described herein again.

For the third time partition, the partition directory name corresponding to the time partition may also be set, and for the specific setting process, reference may be made to the related content of the partition directory name set by the second time partition, which is not described herein again.

In order to facilitate understanding of the data characteristics of the time series data stored according to the first time partition, a specific embodiment of determining the target window is described below by taking the example that the first time partition includes the sub-partition and is exemplarily described with reference to fig. 8.

Fig. 8 is a schematic detailed flowchart of a data storage method according to an embodiment of the present application. As shown in fig. 8, the method mainly includes the following steps:

and S1, acquiring the time sequence data to be stored.

And S2, determining the target partition identification corresponding to the time sequence data to be stored according to the time stamp of the time sequence data to be stored and the currently recorded time window.

S3, judging whether the created time partition has a first time partition corresponding to the target partition identification. If yes, go to step S4; and, if no, proceeds to step S22.

And S4, acquiring the number of the time lines and the number of the data points contained in the time sequence data stored in the latest first sub-partition created in the first time partition as the data characteristics of the time sequence data stored in the first sub-partition.

S5, judging whether the number of time lines contained in the time sequence data stored in the first sub-partition is less than or equal to the set upper limit of the number of time lines; and judging whether the number of data points contained in the time sequence data stored in the first sub-partition is less than or equal to a set lower limit of the number of data points. If yes, go to step S6; if the determination result is negative, go to step S15; otherwise, step S14 is executed, i.e., if the determination result is yes, step 14 is executed.

S6, obtaining a plurality of continuous sub-partitions according to the order of the creation time of the sub-partition of the first time partition from late to early.

And S7, determining the sum of the number of time lines contained in the time sequence data stored in the plurality of sub-partitions and the sum of the number of data points contained in the time sequence data stored in the plurality of sub-partitions.

S8, judging whether the sum of the time line quantity is less than or equal to the upper limit of the time line quantity; and determining whether the sum of the number of data points is less than or equal to the lower limit of the number of data points. If yes, go to step S9; if the determination result is negative, step S14 is executed.

And S9, judging whether the current recorded time window is smaller than the set maximum time window. If yes, go to step S10; if the determination result is negative, step S11 is executed.

In this embodiment, the currently recorded time window may be the time window of the first sub-partition.

S10, increasing the current recorded time window as a target time window; and proceeds to step S12.

S11, determining the currently recorded time window as a target time window; and proceeds to step S12.

And S12, creating a second time partition with a target time window according to the time stamp of the time sequence data to be stored.

And S13, storing the time sequence data to be stored to the second time partition.

And S14, storing the time sequence data to be stored to the first sub-partition.

And S15, judging whether the number of the created sub-partitions in the first time partition is greater than or equal to the set upper limit of the number of the sub-partitions. If yes, go to step S16; if the determination result is negative, step S19 is executed.

And S16, judging whether the current recorded time window is larger than the set minimum time window. If yes, go to step S17; if the determination result is negative, step S18 is executed.

S17, narrowing the currently recorded time window to be used as a target time window; and proceeds to step S12.

S18, determining the currently recorded time window as a target time window; and proceeds to step S12.

And S19, determining the time window of the first sub-partition as a target time window.

And S20, creating a second sub-partition of the first time partition as a second time partition according to the time stamp of the time sequence data to be stored and the time window of the first sub-partition.

And S21, storing the data to be stored in the second sub-partition.

And S22, creating a third time partition according to the currently recorded time window and the time stamp of the time sequence data to be stored.

And S23, storing the time sequence data to be stored in the third time partition.

Besides the data storage method provided by the above embodiment, the embodiment of the present application also provides a data query method. The following provides an exemplary description of a data query method provided in the embodiments of the present application.

Fig. 9 is a schematic flowchart of a data query method according to an embodiment of the present application. As shown in fig. 9, the data query method includes:

901. and acquiring the query request.

902. And acquiring the time range to be inquired and the inquiry condition from the inquiry request.

In the present embodiment, the query condition refers to a query condition of actual metric data.

903. And determining a target time partition with the time range overlapped with the time range to be inquired from the time partitions according to the time range to be inquired.

904. And acquiring target time sequence data meeting the query condition from the time sequence data stored in the target time partition.

905. Based on the target timing data, a query result of the query request is determined.

In this embodiment, when a target time partition whose time range overlaps with a time range to be queried is determined from the time partitions, a partition identifier and a time window may be obtained from a partition directory name corresponding to the time partition; further, the time range of the time partition may be determined based on the partition identification and the time window. Then, the time range to be queried may be matched in the time ranges of the time partitions to obtain a target time partition whose time range overlaps with the time range to be queried.

Further, target time series data satisfying the query condition may be acquired from the time series data stored in the target time partition. Thereafter, a query result of the query request is determined based on the target timing data. Optionally, the target time series data may be merged according to a descending order of the timestamps of the target time series data to obtain the query result.

In some embodiments, a target version number of a storage engine storing target timing data may also be obtained from the partition directory name of the target time partition. Further, the target time sequence data can be processed by using a storage engine corresponding to the target version number to obtain a query result.

It should be noted that, the executing subjects of the steps of the method provided in the foregoing embodiments may be the same device, or different devices may also be used as the executing subjects of the method. For example, the execution subject of

steps

601 and 602 may be device a; for another example, the execution subject of step 601 may be device a, and the execution subject of step 602 may be device B; and so on.

In addition, in some of the flows described in the above embodiments and the drawings, a plurality of operations are included in a specific order, but it should be clearly understood that the operations may be executed out of the order presented herein or in parallel, and the sequence numbers of the operations, such as 601, 602, etc., are merely used for distinguishing different operations, and the sequence numbers themselves do not represent any execution order. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel.

Accordingly, embodiments of the present application further provide a computer-readable storage medium storing computer instructions, which, when executed by one or more processors, cause the one or more processors to perform the steps of the data storage method and/or the data query method.

Embodiments of the present application further provide a computer program product, including a computer program. The computer program, when executed by a processor, causes the processor or processors to perform the steps in the data storage method and/or the data query method described above. In the embodiments of the present application, the implementation form of the computer program product is not limited. In some embodiments, the computer program product may be implemented as, but is not limited to, database management software or the like.

Fig. 10 is a schematic structural diagram of a computing device according to an embodiment of the present application. As shown in fig. 10, the computing device includes: a memory 100a and a processor 100 b; the memory 100a is used for storing a computer program. Memory 100a creates a time partition;

the processor 100b is coupled to the memory 100a for executing a computer program for: acquiring time sequence data to be stored; under the condition that a first time partition corresponding to a timestamp of time sequence data to be stored exists in the created time partition, acquiring data characteristics of the time sequence data stored in the first time partition; determining a target time window according to the data characteristics of the time sequence data stored in the first time partition; creating a second time partition with a target time window according to the time stamp of the time sequence data to be stored; and storing the time sequence data to be stored in the second time partition.

In some embodiments, the processor 100a is further configured to: determining a target partition identifier corresponding to the time sequence data to be stored according to the time stamp of the time sequence data to be stored and the currently recorded time window; searching the target partition identification in the partition identification of the established time partition; and if the target partition identification is found in the partition identifications of the established time partitions, determining that the first time partition exists.

Further, the processor 100b is further configured to: under the condition that the first time partition does not exist in the created time partition, creating a third time partition according to a currently recorded time window and a time stamp of time sequence data to be stored; and storing the time sequence data to be stored in a third time partition.

In some embodiments, the processor 100b, when acquiring the data characteristic of the time series data stored in the first time partition, is specifically configured to: and acquiring the number of time lines and the number of data points contained in the time sequence data stored in the first time partition as the data characteristics of the time sequence data stored in the first time partition. Accordingly, the processor 100b, when determining the target time window, is specifically configured to: determining a target time window according to the number of time lines and the number of data points contained in the time sequence data stored in the first time partition; wherein, a series of data generated by the change of an index of the same object in the time sequence data along with time is a time line; one index value in each time line is one data point.

Optionally, the first time partition comprises at least one sub-partition; the time windows of at least one of the sub-partitions are identical. Accordingly, the processor 100b, when determining the target time window, is specifically configured to: and determining a target time window according to the number of time lines and the number of data points contained in the time sequence data stored by the latest first sub-partition created in at least one sub-partition.

Optionally, the processor 100b, in determining the target time window, performs at least one of the following determination operations:

judging whether the number of time lines contained in the time sequence data stored in the first sub-partition is less than or equal to a set upper limit of the number of time lines;

judging whether the number of data points contained in the time sequence data stored in the first sub-partition is less than or equal to a set lower limit of the number of data points;

if the judgment result of the at least one judgment operation is negative, under the condition that the number of the at least one sub-partition is larger than or equal to the set upper limit of the number of the sub-partitions, the time window of the first sub-partition is reduced to obtain the target time window. And if the judgment result of the at least one judgment operation is negative, determining the time window of the first sub-partition as the target time window under the condition that the number of the at least one sub-partition is smaller than the set upper limit of the number of the sub-partitions.

Accordingly, the processor 100b, when creating the second time partition having the target time window, is specifically configured to: determining the starting time of the second time partition according to the time stamp of the time sequence data to be stored; and creating a second sub-partition of the first time partition as a second time partition according to the starting time of the second time partition and the time window of the first sub-partition.

Optionally, the processor 100b is further configured to: determining the partition identification of the second sub-partition as the partition identification of the first time partition; determining a sub-partition identifier of a second sub-partition according to the number of sub-partitions corresponding to the first time partition; determining a partition identifier, a time window and a second partition directory name of the sub-partition identifier for representing a second sub-partition, wherein the second partition directory name is the partition directory name of the second sub-partition; the second partition directory name is stored.

In some embodiments, the processor 100b is further configured to: if the judgment result of at least one judgment operation is yes, acquiring a plurality of sub-partitions from the first time partition; determining the sum of the number of time lines contained in the time sequence data stored in the plurality of sub-partitions and the sum of the number of data points contained in the time sequence data stored in the plurality of sub-partitions; and if the sum of the number of the time lines is less than or equal to the upper limit of the number of the time lines and/or the sum of the number of the data points is less than or equal to the lower limit of the number of the data points, increasing the time window of the first sub-partition to be used as a target time window. And if the sum of the number of the time lines is greater than the set number of the time lines and the upper limit, and the sum of the number of the data points is greater than the set number of the data points and the lower limit, storing the time sequence data to be stored in the first sub-partition.

Optionally, the processor 100b is further configured to: and if the time sequence data stored in the first sub-partition contains a time line quantity smaller than or equal to the set upper limit of the time line quantity, and the time sequence data stored in the first sub-partition contains a data point quantity larger than the set lower limit of the data point quantity, storing the time sequence data to be stored in the first sub-partition. Or, if the time sequence data stored in the first sub-partition contains a time line number greater than the set upper limit of the time line number, and the time sequence data stored in the first sub-partition contains a data point number less than or equal to the set lower limit of the data point number, storing the time sequence data to be stored in the first sub-partition.

In some embodiments, the processor 100b, when creating the second time partition having the target time window, is specifically configured to: determining the starting time of the second time partition according to the time stamp of the time sequence data to be stored; a second time partition having a target time window is created based on the start time of the second time partition.

Optionally, the processor 100b is further configured to: determining a partition identifier of a second time partition according to the timestamp of the time sequence data to be stored and the target time window; determining a partition identifier corresponding to the second time partition and a first partition directory name of the time window, wherein the first partition directory name is used for representing the second time partition; the first partition directory name is stored.

Optionally, when the processor 100b stores the time series data to be stored in the second time partition, it is specifically configured to: and utilizing a storage engine to store the time sequence data to be stored into a storage space corresponding to the second time partition in a persistent mode. Accordingly, the processor 100b is further configured to: the version number of the storage engine is written to the first partition directory name.

In the embodiment of the present application, the processor 100b is further configured to: acquiring a query request; acquiring target time sequence data meeting the query request; acquiring a target version number of a storage engine for storing target time sequence data from a partition directory name corresponding to the target time sequence data; and processing the target time sequence data by using a storage engine corresponding to the target version number to obtain a query result.

In the embodiment of the present application, a specific implementation form of the computing device is not limited. The computing device may be implemented as a single server device, as a cloud-based server array, or as a Virtual Machine (VM) running in a cloud-based server array. In addition, the computing device may also refer to other computing devices with corresponding service capabilities, such as a terminal device (running a service program) such as a computer.

In some optional implementations, as shown in fig. 10, the computing device may further include: communication component 100c, power component 100d, etc. For terminal equipment such as a computer, the method can also comprise the following steps: a display component 100e and an audio component 100 f. Only some of the components are shown schematically in fig. 10, and it is not meant that the computing device must include all of the components shown in fig. 10, nor that the computing device can include only the components shown in fig. 10.

On one hand, the computing device provided by this embodiment can store the data to be stored in the newly created second time partition, instead of directly storing the data to be stored in the first time partition corresponding to the timestamp of the time series data to be stored, which is helpful for reducing the problem of high base number of the timeline in the first time partition; on the other hand, the data characteristics of the first time partition corresponding to the timestamp of the time sequence data to be stored can reflect the time line base number of the time sequence data in the time range corresponding to the first time partition to a certain extent, so that the time window of the second time partition is adaptively adjusted according to the data characteristics of the first time partition, the time window of the second time partition is elastically expanded and contracted along with the data characteristics, and the problem of the high base number of the time line of the second time partition can be reduced. In this way, the problem of high cardinality of the time line does not exist in each time partition, and the index of each time partition is less, so that the index query efficiency of each time partition is higher when the time sequence data is queried.

In embodiments of the present application, the memory is used to store computer programs and may be configured to store other various data to support operations on the device on which it is located. Wherein the processor may execute a computer program stored in the memory to implement the corresponding control logic. The memory may be implemented by any type or combination of volatile and non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

In the embodiments of the present application, the processor may be any hardware processing device that can execute the above described method logic. Alternatively, the processor may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or a Micro Controller Unit (MCU); programmable devices such as Field-Programmable Gate arrays (FPGAs), Programmable Array Logic devices (PALs), General Array Logic devices (GAL), Complex Programmable Logic Devices (CPLDs), etc. may also be used; or Advanced Reduced Instruction Set (RISC) processors (ARM), or System On Chips (SOC), etc., but is not limited thereto.

In embodiments of the present application, the communication component is configured to facilitate wired or wireless communication between the device in which it is located and other devices. The device in which the communication component is located can access a wireless network based on a communication standard, such as WiFi, 2G or 3G, 4G, 5G or a combination thereof. In an exemplary embodiment, the communication component receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component may also be implemented based on Near Field Communication (NFC) technology, Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, or other technologies.

In the embodiment of the present application, the display assembly may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the display assembly includes a touch panel, the display assembly may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.

In embodiments of the present application, a power supply component is configured to provide power to various components of the device in which it is located. The power components may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device in which the power component is located.

In embodiments of the present application, the audio component may be configured to output and/or input an audio signal. For example, the audio component includes a Microphone (MIC) configured to receive an external audio signal when the device in which the audio component is located is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in a memory or transmitted via a communication component. In some embodiments, the audio assembly further comprises a speaker for outputting audio signals. For example, for devices with language interaction functionality, voice interaction with a user may be enabled through an audio component, and so forth.

It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement or the like made within the spirit and principle of the present application shall be included in the scope of the claims of the present application.

Claims

1. A method of storing data, comprising:

acquiring time sequence data to be stored;

and storing the time sequence data to be stored in the second time partition.

2. The method of claim 1, wherein the obtaining the data characteristic of the time series data stored in the first time partition comprises:

acquiring the number of time lines and the number of data points contained in the time sequence data stored in the first time partition as the data characteristics of the time sequence data stored in the first time partition;

the determining a target time window according to the data characteristics of the time series data stored in the first time partition includes:

determining a target time window according to the number of time lines and the number of data points contained in the time sequence data stored in the first time partition;

wherein, a series of data generated by one index of the same object in the time sequence data along with the change of time is a time line; one index value in each time line is one data point.

3. The method of claim 2, wherein the first time partition comprises at least one sub-partition; the time windows of the at least one sub-partition are the same;

the determining a target time window according to the number of the time lines and the number of the data points contained in the time sequence data stored in the first time partition includes:

and determining a target time window according to the number of time lines and the number of data points contained in the time sequence data stored by the latest first sub-partition created in the at least one sub-partition.

4. The method of claim 3, wherein determining the target time window according to the number of timelines and the number of data points included in the stored timing data of the latest first sub-partition created in the at least one sub-partition comprises performing at least one of the following determination operations:

if the judgment result of the at least one judgment operation is negative, under the condition that the number of the at least one sub-partition is larger than or equal to the set upper limit of the number of the sub-partitions, reducing the time window of the first sub-partition to obtain a target time window.

5. The method of claim 4, further comprising: and if the judgment result of the at least one judgment operation is negative, determining the time window of the first sub-partition as the target time window under the condition that the number of the at least one sub-partition is smaller than the set upper limit of the number of the sub-partitions.

6. The method according to claim 4, wherein if the determination result of the at least one determination operation is yes, determining a target time window according to the number of time lines and the number of data points included in the time series data stored in the first time partition, further comprises:

obtaining a plurality of sub-partitions from the first time partition;

determining the sum of the number of time lines contained in the time sequence data stored in the plurality of sub-partitions and the sum of the number of data points contained in the time sequence data stored in the plurality of sub-partitions;

and if the sum of the number of the time lines is less than or equal to the upper limit of the number of the time lines and/or the sum of the number of the data points is less than or equal to the lower limit of the number of the data points, increasing the time window of the first sub-partition as the target time window.

7. The method according to claim 4 or 6, wherein creating a second time partition having the target time window according to the time stamp of the time series data to be stored comprises:

determining the starting time of the second time partition according to the timestamp of the time sequence data to be stored;

and creating a second time partition with the target time window according to the starting time of the second time partition.

8. The method of claim 7, further comprising:

determining a partition identifier of the second time partition according to the timestamp of the time sequence data to be stored and the target time window;

determining a first partition directory name used for representing a partition identifier and a time window corresponding to the second time partition, wherein the first partition directory name is the partition directory name of the second time partition;

storing the first partition directory name.

9. The method of claim 5, wherein the creating a second time partition having the target time window according to the time stamp of the time series data to be stored comprises:

and creating a second sub-partition of the first time partition as the second time partition according to the starting time of the second time partition and the time window of the first sub-partition.

10. The method of claim 9, further comprising:

determining that the partition identification of the second sub-partition is the partition identification of the first time partition;

determining the sub-partition identification of the second sub-partition according to the sub-partition number corresponding to the first time partition;

determining a partition identifier, a time window and a second partition directory name of the sub-partition identifier for representing the second sub-partition, wherein the second partition directory name is the partition directory name of the second sub-partition;

storing the second partition directory name.

11. The method of claim 6, further comprising:

and if the sum of the number of the time lines is greater than the set number of the time lines and the upper limit, and the sum of the number of the data points is greater than the set number of the data points and the lower limit, storing the time sequence data to be stored in the first sub-partition.

12. The method of claim 1, further comprising:

determining a target partition identifier corresponding to the time sequence data to be stored according to the time stamp of the time sequence data to be stored and the currently recorded time window;

searching the target partition identification in the partition identification of the established time partition;

and if the target partition identification is found in the partition identification of the established time partition, determining that the first time partition exists.

13. The method of claim 12, further comprising:

under the condition that the first time partition does not exist in the created time partition, creating a third time partition according to a currently recorded time window and a timestamp of the time sequence data to be stored;

and storing the time sequence data to be stored in the third time partition.

14. A computing device, comprising: a memory and a processor; wherein the memory is to store a computer program; the memory creates a temporal partition;

the processor is coupled to the memory for executing the computer program for performing the steps of the method of any of claims 1-13.