CN110619000A - Time sequence data query method and device, storage medium and electronic equipment - Google Patents

Time sequence data query method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN110619000A
CN110619000A CN201910912865.8A CN201910912865A CN110619000A CN 110619000 A CN110619000 A CN 110619000A CN 201910912865 A CN201910912865 A CN 201910912865A CN 110619000 A CN110619000 A CN 110619000A
Authority
CN
China
Prior art keywords
data
data source
query
time sequence
searching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910912865.8A
Other languages
Chinese (zh)
Inventor
范欣欣
闵涛
蒋鸿翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Netease Hangzhou Network Co Ltd
Original Assignee
Netease Hangzhou Network Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Netease Hangzhou Network Co Ltd filed Critical Netease Hangzhou Network Co Ltd
Priority to CN201910912865.8A priority Critical patent/CN110619000A/en
Publication of CN110619000A publication Critical patent/CN110619000A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24535Query rewriting; Transformation of sub-queries or views
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention relates to a method and a device for querying time series data, a storage medium and electronic equipment, and relates to the technical field of databases. The method comprises the following steps: acquiring a data query request, wherein the data query request comprises at least two data source query conditions, and the data source query conditions are in a relation with each other; acquiring the number of data sources corresponding to each data source query condition in the time sequence database based on the index information of the time sequence database, and determining the data source query condition with the least number of data sources as a main query condition and the rest data source query conditions as auxiliary query conditions; searching a data source which accords with the main query condition in a time sequence database to obtain a data source set, and searching a target data source which accords with the auxiliary query condition in the data source set; and searching corresponding time sequence data according to the target data source. The invention can improve the problem of result redundancy generated by inquiring time sequence data through a plurality of data source inquiry conditions, reduce the time required by inquiry and improve the efficiency.

Description

Time sequence data query method and device, storage medium and electronic equipment
Technical Field
The embodiment of the invention relates to the technical field of databases, in particular to a time series data query method, a time series data query device, a computer readable storage medium and an electronic device.
Background
This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims and the description herein is not admitted to be prior art by inclusion in this section.
The time series data is data generated or recorded based on time, and the trend, the rule and the like of data change can be revealed by analyzing the time series distribution characteristics of the time series data, so that the time series data is widely applied to the fields of industrial management, internet of things, big data analysis, machine learning and the like. The time sequence data is usually stored and managed by adopting a time sequence database, and the commonly used time sequence database comprises infixdb, TimescaleDB, kairodb and the like, and the characteristics of the time sequence database generally comprise: and fast writing, persistence, aggregate statistical query and the like of time sequence data are supported.
In the existing time sequence database system, when a user inputs a plurality of data source query conditions with an and relationship to query time sequence data, the system queries each data source query condition respectively to obtain a corresponding result set, and then the result set is intersected to obtain results simultaneously meeting the plurality of data source query conditions, and the results are fed back to the user.
Disclosure of Invention
However, in the above query method, a large amount of query result redundancies are generated in the process of querying each data source query condition, and then the redundancies are removed by taking an intersection, so that too much time is consumed in the query process, and the query efficiency is low.
Therefore, an improved method for querying time series data is highly needed, which can improve the efficiency of querying time series data.
In this context, embodiments of the present invention are intended to provide a method for querying time series data, a device for querying time series data, a computer-readable storage medium, and an electronic device.
According to a first aspect of the embodiments of the present invention, there is provided a method for querying time series data, including: acquiring a data query request, wherein the data query request comprises at least two data source query conditions, and the at least two data source query conditions are in a relation with each other; acquiring the number of data sources corresponding to each data source query condition in a time sequence database based on index information of the time sequence database, and determining that the data source query condition with the least number of data sources is a main query condition and the rest of the data source query conditions are slave query conditions; searching a data source meeting the main query condition in the time sequence database to obtain a data source set, and searching a target data source meeting the auxiliary query condition in the data source set; and searching corresponding time sequence data according to the target data source.
In an optional implementation manner, after obtaining the number of data sources corresponding to each data source query condition in the time-series database, the method further includes: when the minimum value in the data source quantity is judged to be larger than a preset threshold value, searching data sources which accord with the data source query conditions in the time sequence database respectively to obtain a plurality of data source sets; collecting intersection sets of all the searched data source sets to obtain a data source subset; searching corresponding time sequence data according to the data sources in the data source subset; and when the minimum value in the data source quantity is judged to be smaller than the preset threshold value, executing a step of determining that the data source query condition with the minimum data source quantity is a main query condition and the rest data source query conditions are auxiliary query conditions.
In an optional embodiment, the data source query condition includes a target dimension value, where different data source query conditions correspond to different dimensions.
In an alternative embodiment, the time series database has a preconfigured index file; the obtaining of the number of data sources corresponding to each data source query condition in the time sequence database based on the index information of the time sequence database includes: respectively searching a storage area where a target dimension value in each data source query condition is located in the index file, and acquiring the number of data sources corresponding to each target dimension value from the storage area; and counting the number of data sources corresponding to the target dimension value in each data source query condition to obtain the number of data sources corresponding to each data source query condition.
In an optional implementation manner, the searching, in the time-series database, for a data source that meets the master query condition to obtain a data source set, and searching, in the data source set, for a target data source that meets the slave query condition includes: searching a data source corresponding to the target dimension value in the main query condition in the time sequence database to obtain the data source set; and screening the data sources in the data source set according to the target dimension values in the query conditions to obtain the target data sources.
In an optional embodiment, the data query request further includes a time period to be queried; the searching for the corresponding time sequence data according to the target data source comprises: and searching time sequence data which corresponds to the target data source and is in the time period to be inquired.
In an optional implementation manner, the data query request further includes a data index corresponding to the time series data to be queried; the searching for the time sequence data corresponding to the target data source and in the time period to be queried includes: and searching time sequence data which corresponds to the target data source and is in the time period to be inquired under the data index.
According to a second aspect of the embodiments of the present invention, there is provided an apparatus for querying time series data, including: the request acquisition module is used for acquiring a data query request, wherein the data query request comprises at least two data source query conditions, and the at least two data source query conditions are in a relation with each other; the master-slave determining module is used for acquiring the number of data sources corresponding to each data source query condition in the time sequence database based on the index information of the time sequence database, determining the data source query condition with the least number of data sources as a master query condition, and determining the rest data source query conditions as slave query conditions; the data source searching module is used for searching a data source which accords with the main query condition in the time sequence database to obtain a data source set, and searching a target data source which accords with the auxiliary query condition in the data source set; and the time sequence data searching module is used for searching corresponding time sequence data according to the target data source.
In an optional implementation manner, the master-slave determining module is configured to determine that the data source query condition with the smallest number of data sources is a master query condition and the rest of the data source query conditions are slave query conditions when it is determined that the minimum value in the number of data sources is smaller than the preset threshold; the data source searching module is further configured to search data sources meeting the query conditions of the data sources in the time sequence database respectively to obtain a plurality of data source sets when the master-slave determining module determines that the minimum value of the number of the data sources is greater than a preset threshold value, and obtain an intersection of all the searched data source sets to obtain a data source subset; the time sequence data searching module is further configured to search corresponding time sequence data according to the data sources in the data source subset.
In an optional embodiment, the data source query condition includes a target dimension value, where different data source query conditions correspond to different dimensions.
In an alternative embodiment, the time series database has a preconfigured index file; the master-slave determination module comprises: a storage area searching unit, configured to search, in the index file, storage areas where target dimension values in the data source query conditions are located, and obtain, from the storage areas, the number of data sources corresponding to each target dimension value; and the data source counting unit is used for counting the number of data sources corresponding to the target dimension value in each data source query condition to obtain the number of data sources corresponding to each data source query condition.
In an optional implementation manner, the data source searching module includes: the main query unit is used for searching a data source corresponding to the target dimension value in the main query condition in the time sequence database to obtain the data source set; and the slave query unit is used for screening the data sources in the data source set according to the target dimension values in the slave query conditions to obtain the target data sources.
In an optional embodiment, the data query request further includes a time period to be queried; the time sequence data searching module is further configured to search the time sequence data corresponding to the target data source and in the time period to be queried.
In an optional implementation manner, the data query request further includes a data index corresponding to the time series data to be queried; the time sequence data searching module is further configured to search, under the data index, time sequence data corresponding to the target data source and in the time period to be queried.
According to a third aspect of embodiments of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements any one of the methods described above.
According to a fourth aspect of the embodiments of the present invention, there is provided an electronic apparatus including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform any of the methods described above via execution of the executable instructions.
According to the time series data query method, the time series data query device, the computer readable storage medium and the electronic device, after the data query request is obtained, the number of data sources corresponding to each data source query condition is determined, the data source query condition with the minimum number of data sources serves as a main query condition, the rest data sources are slave query conditions, the data sources meeting the main query condition are searched first, then the target data sources meeting the slave query conditions are further screened from the data sources, and finally the corresponding time series data are searched according to the target data sources. Because the number of the data sources corresponding to the main query condition is minimum, the corresponding data sources are searched according to the main query condition, the range of the data sources to be searched can be reduced to the maximum degree, and the subsequent searching process according to the auxiliary query condition is limited in a smaller data source set, so that the problem of result redundancy caused by respectively querying each data source query condition is solved, the time for querying time sequence data through a plurality of data source query conditions is reduced, and the query efficiency is improved.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
FIG. 1 is a flow chart illustrating a method for querying time series data according to an embodiment of the invention;
FIG. 2 shows a schematic diagram of generating time series data;
FIG. 3 shows a schematic diagram of a TSI file;
FIG. 4 is a diagram showing a data structure of Tag Block;
FIG. 5 is a sub-flowchart of a method for querying time series data according to an embodiment of the present invention;
FIG. 6 is a sub-flow diagram illustrating another method of querying time series data according to an embodiment of the present invention;
FIG. 7 is a flow chart illustrating another method of querying time series data according to an embodiment of the present invention;
fig. 8 is a block diagram showing a configuration of a query apparatus for time series data according to an embodiment of the present invention;
FIG. 9 shows a schematic diagram of a storage medium according to an embodiment of the invention; and
fig. 10 shows a block diagram of the structure of an electronic device according to an embodiment of the present invention.
In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Detailed Description
The principles and spirit of the present invention will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the invention, and are not intended to limit the scope of the invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
As will be appreciated by one skilled in the art, embodiments of the present invention may be embodied as a system, apparatus, device, method, or computer program product. Thus, the present invention may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
According to an embodiment of the invention, a time series data query method, a time series data query device, a computer readable storage medium and an electronic device are provided.
In this document, any number of elements in the drawings is by way of example and not by way of limitation, and any nomenclature is used solely for differentiation and not by way of limitation.
The principles and spirit of the present invention are explained in detail below with reference to several representative embodiments of the invention.
Summary of The Invention
The inventor finds that in the existing time sequence database system, when a user inputs a plurality of data source query conditions with the relation of AND to query time sequence data, the system queries each data source query condition respectively to obtain a corresponding result set, and then takes intersection of the result sets. In the query mode, a large amount of query result redundancy is generated in the query process, so that the query process consumes too much time, and the query efficiency is low.
In view of the above, the basic idea of the present invention is: after a data query request is obtained, the number of data sources corresponding to each data source query condition is determined, the data source query condition with the minimum number of data sources is used as a main query condition, the rest are auxiliary query conditions, the data sources meeting the main query condition are searched first, then target data sources meeting the auxiliary query conditions are further screened from the data sources, and finally corresponding time sequence data are searched according to the target data sources. Because the number of the data sources corresponding to the main query condition is minimum, the corresponding data sources are searched according to the main query condition, the range of the data sources to be searched can be reduced to the maximum degree, and the subsequent searching process according to the auxiliary query condition is limited in a smaller data source set, so that the problem of result redundancy caused by respectively querying each data source query condition is solved, the time for querying time sequence data through a plurality of data source query conditions is reduced, and the query efficiency is improved.
Having described the general principles of the invention, various non-limiting embodiments of the invention are described in detail below.
Application scene overview
It should be noted that the following application scenarios are merely illustrated to facilitate understanding of the spirit and principles of the present invention, and the embodiments of the present invention are not limited in this respect. Rather, embodiments of the present invention may be applied to any scenario where applicable.
The present invention can be applied to a scenario where time series data is queried through an input data source, such as: in the industrial management scene, real-time monitoring data of the equipment is stored in a time sequence database, when data are analyzed daily by managers, data are searched and extracted by inputting data source query conditions (such as serial numbers of production lines, equipment names, sensor names and the like), and the data can be rapidly queried and read by the exemplary embodiment; in the scenario of a financial platform, a platform side can store running data in a user account in a time sequence data form, and when a user inputs a data source query condition (such as a user account number, a bank name, a bank card type, a transaction currency and the like) to search transaction history data, the data can be quickly searched through the exemplary embodiment, so that the response speed of data service is improved.
Exemplary method
An exemplary embodiment of the present invention first provides a method for querying time series data, as shown in fig. 1, the method may include the following steps S110 to S140:
step S110, a data query request is obtained, where the data query request includes at least two data source query conditions, and each data source query condition has a relationship with each other.
The data source in the time series data is specifically described below.
Fig. 2 shows a schematic diagram of the generation of time series data, in which there are 4 sensors (sensors) whose manufacturer (Manufacture) is a or B, each Sensor also having a number (Sensor), the manufacturer numbering uniquely identifying a Sensor from which the data originated. Each sensor measures a temperature value, called a value, every 1 second. In typical time series data, there are three types of data: time column, dimension column, and value column. The corresponding Sensor timing data table is shown in table 1:
TABLE 1
Time Manufacture Sensorid Temperature
00:00:00 A 0001 12
00:00:00 A 0002 1
00:00:00 B 0001 10
00:00:00 B 0002 5
00:00:01 A 0001 11
00:00:01 A 0002 3
00:00:01 B 0001 8
00:00:01 B 0002 7
00:00:02 A 0001 12
00:00:02 A 0002 2
00:00:02 B 0001 7
00:00:02 B 0002 5
In Table 1, Time is a Time column (Timestamp), Manufatur and Sensorid are dimension columns (Tag), and Temperature is a numerical column (Field). The time in the time column is a time stamp of each time series data. The Sensor + manufature + Sensor can uniquely determine the data source, i.e. the data source in the time series data. Taking infiluxdb as an example, a data source is represented by using series key (sequence key), which is a combination of three character strings, i.e., Sensor, management and Sensor, for example, there are four data sources in table 1, and the data sources are: sensor + A +0001, Sensor + A +0002, Sensor + B +0001, and Sensor + B + 0002. The data source and the timestamp together form a key of the time series data, and the corresponding numerical value column can be regarded as a value of the time series data, for example, Temperature in table 1 represents an index of the data, and a specific Temperature value under the index is a value of the time series data, which is also time series data that a user usually needs to search. Only the data in time minute and second format is shown in table 1, but it is also common to have dates in practical applications.
As can be seen from the above, it is a very common query scenario to query the corresponding time series data through the data sources, for example, in the time series data table shown in table 1, a user wants to search for temperature data of a sensor with a manual of a and a sensory of 0002, and the user may input two data source query conditions, "the manual of a" and "the sensory of 0002", where the two data source query conditions are in an and relationship, that is, the query result needs to satisfy the two data source query conditions at the same time. In this exemplary embodiment, when a user inputs two or more data source query conditions in a system of a time series database (e.g., a management system of the time series database, a client program of data analysis, a corresponding web page, etc.), and the data source query conditions are in a relationship with each other, the system may generate a corresponding query request, for example, the query request may include the following query statements: SELECT free Table WHERE manufactured ' a ' and sensor 0002 '.
Additionally, if the query request is generated on a client, the query request may be sent to a server of the timing database.
In an alternative embodiment, the data source query condition may include a target dimension value. For example, in table 1, Type, manual, and sensor are dimensions, data in each dimension is a dimension value, and when a user inputs a data source query condition, "manual ═ a" and "sensor ═ 0002" that are input are target dimension values in the dimension in which the manual and sensor are specified. In the exemplary embodiment, the query condition corresponding to each dimension can be used as a data source query condition, that is, different data source query conditions correspond to different dimensions, and the query request is equivalent to merging the query conditions of the dimensions and also conforms to the habit of inputting the query conditions by the user.
Step S120, based on the index information of the time sequence database, acquiring the number of data sources corresponding to each data source query condition in the time sequence database, and determining that the data source query condition with the least number of data sources is the main query condition and the rest data source query conditions are the auxiliary query conditions.
The index information of the time sequence database is information data obtained by counting time sequence data in the time sequence database, and is used for searching indexes for the time sequence data, quickly determining storage addresses and the like. Some time series databases have preconfigured index files for storing index information.
Taking infiluxdb as an example, the Index information is stored in a TSI (Time Series Index) File, as shown in fig. 3, the TSI File mainly consists of an Index File Trailer, a Measurement Block, a Tag Block, and a Series Block, where the data structure and function of each part are as follows:
the Index File Trailer stores the offset (offset) of the Measurement Block and each Tag Block in the TSI File, and the data length (size), in short, the Index to the Measurement Block and each Tag Block.
The Measurement Block stores information of a data table in the infiluxdb, for example, table 1 is a Sensor timing data table, that is, a data table in the infiluxdb.
The data structure of Tag Block can be referred to fig. 4, and its core is a double mapping structure: map < Tag Key, Map < Tag Value, List < SerisID > >, i.e., Tag Value is mapped by Tag Key (dimension name) and SerisID is mapped by Tag Value (sequence number). The Block Trailer is used for storing Index information of other parts, and the Hash Index is used for quickly indexing the Tag Key and the Tag Value.
Series Block stores the information of Series Key, can look for corresponding Series Key in Series Block according to Series ID.
Suppose that time series data of "manufacturing ═ a" and "sensing ═ 0002" need to be queried, the manufacturing and sensing are Tag keys, and a and 0002 are corresponding Tag values. Firstly, searching is carried out in MeasumerBlock according to the table name "Sensor", and the TSI file area where the dimensionality corresponding to the Sensor table is located can be directly located. Locating to the storage area of the corresponding Tag Value through the Hash Index of the Man efficiency, locating to the specific Tag Value position through the Hash Index of A, and reading all series IDs, series.data.Length (the length of sequence data), series.n (the number of series IDs), Value Length (the length of dimension Value) and Flag (Flag bit) corresponding to the 'Man efficiency ═ A'. The same applies to reading the above information corresponding to "sensed ═ 0002".
All series IDs meeting the requirement of 'Manual ═ A' can be searched through the Tag Key ═ Manual, and the Tag Value ═ A, and the searched IDs are recorded as series IDSet 1; all series ids satisfying "series id 0002" can be found by Tag Key ═ series, and Tag Value ═ 0002 is denoted as series idset 2. In the related art, according to the and relationship between "Manufacture ═ a" and "mounted ═ 0002", the Series idset1 and the Series idset2 are intersected to obtain the Series id which simultaneously satisfies the two query conditions, and then the corresponding Series key is searched in the Series Block, so that the required data source is found. Among them, there are a lot of query result redundancies in series idset1 and series idset2, for example, a manufacturer produces 1000 ten thousand sensors, then using query condition "manfacture ═ a" will retrieve series idset1 whose data volume is 1000 ten thousand, using filter condition "senorid ═ 0002" will retrieve series idset2 whose data volume is 2, then combining the two sets to intersect, the final result is only 1 piece of data, 1000 ten thousand pieces of data in series idset1 are mostly redundant, and the whole query process takes too much time.
In the exemplary embodiment, the number of data sources corresponding to each data source query condition in the time sequence database can be obtained, and compared with directly obtaining the data sources corresponding to the data source query conditions, the number of the data sources is only one piece of data, so that the data volume to be read is greatly reduced, and the implementation is easier.
In an alternative embodiment, referring to fig. 5, step S120 may specifically include the following steps S510 and S520:
step S510, respectively searching storage areas where the target dimension values in the query conditions of the data sources are located in the index file, and acquiring the number of the data sources corresponding to the target dimension values from the storage areas;
step S520, counting the number of data sources corresponding to the target dimension value in each data source query condition to obtain the number of data sources corresponding to each data source query condition.
Taking infiluxdb as an example, as shown in fig. 3 and 4, in the data structure of the TSI file, after the location of the Tag Value is located, the series.n Value therein, that is, the corresponding data source number is read, and if one data source query condition includes a plurality of target dimension values in one dimension, the sum of the data source numbers corresponding to the target dimension values is counted as the data source number corresponding to the data source query condition.
For example, through Tag Key ═ Manufacture, Tag Value ═ a, it is possible to find that the series.n (i.e., the number of data sources) corresponding to the data source query condition "Manufacture ═ a" is 1000 ten thousand, and through Tag Key ═ sense, Tag Value ═ 0002, it is possible to find that the series.n corresponding to the data source query condition "sense ═ 0002" is 2.
After the data source number corresponding to each data source query condition is obtained, the data source query condition with the smallest data source number is determined as a master query condition, where "sensed ═ 0002" is the master query condition, and the rest of the data source query conditions are slave query conditions, where "management ═ a" is the slave query condition.
For other time series databases, the process of querying the number of data sources and determining the master query condition and the slave query condition is the same, and thus the description is omitted.
Step S130, searching a data source meeting the main query condition in the time sequence database to obtain a data source set, and searching a target data source meeting the auxiliary query condition in the data source set.
In step S120, it is determined that a data source query condition is a main query condition, and a query is performed according to the main query condition to obtain a data source set meeting the condition, where the obtained data source set is a smaller set because the main query condition is the query condition with the smallest number of data sources; and then further screening is carried out in the data source set according to the slave query conditions, if a plurality of slave query conditions exist, the searching and screening can be carried out in sequence according to any sequence, and the process of further screening from the data source set can be completed in a short time due to the fact that the data source set is small.
In an alternative embodiment, referring to fig. 6, step S130 may specifically include the following steps S610 and S620:
step S610, searching a data source corresponding to the target dimension value in the main query condition in a time sequence database to obtain a data source set;
and S620, screening the data sources in the data source set according to the target dimension values in the query conditions to obtain the target data sources.
For example, the "Sensorid ═ 0002" is a master query condition, and the "throughput ═ a" is a slave query condition. Firstly, querying a corresponding series ID set according to a main query condition of 'Sensorid ═ 0002', wherein two series IDs exist in the series ID set in the example of the table 1; then retrieving a corresponding series Key set in the TSI file, wherein the set comprises two series keys of a Sensor + A +0002 and a Sensor + B + 0002; and further screening according to a query condition 'Manufacture ═ A', wherein only the Sensor + A +0002 in the two Series keys meets the condition, and the obtained Sensor + A +0002 is the target data source. Of course, the number of target data sources is not limited in the exemplary embodiment.
Step S140, searching for corresponding time series data according to the target data source.
After determining the target data source, the target data source may be used as an index to search the time series database for corresponding time series data, for example, in the example of table 1, if the target data source is determined to be Sensor + a +0002, there may be 3 pieces of corresponding time series data to be searched, including: temperature is 1, 3, and 2.
Further, if the data query request further includes a time period to be queried, time series data corresponding to the target data source and in the time period to be queried may be searched. For example, the query statement in the query request may be: SELECT free table WHERE manufactured, a ', 0002', Time 00, 00:00:01 ', and Time 00:00: 02'. Besides the data source query condition, the time period [00:00:01, 00:00:02] to be queried is also included, so that the time series data corresponding to the time period can be further screened out in table 1 as: temperature is 3 and Temperature is 2.
Furthermore, if the data query request further includes a data index corresponding to the time series data to be queried, the time series data corresponding to the target data source and in the time period to be queried may also be searched under the data index. Table 1 shows only one data indicator, i.e. Temperature, and in general, the time series data includes more than one data indicator, for example, the sensor can measure Current, Pressure, Humidity, etc. in addition to Temperature, which are all data indicators. When a user queries data, a data index may be specified to query time series data under the index, for example, a query statement in a query request may be: the selection Temperature FROM Table of the device is 'A' and the sensing is '0002' and time is '00: 00: 01' and '00: 00: 02'. The time sequence data to be inquired is limited to be the data under the Temperature index, and the system can screen the inquiry result according to the condition.
It should be understood that, in practical applications, other types of query conditions may also be set to facilitate a user to perform more detailed data search, and the system may filter or merge results according to a logical relationship between the query conditions when executing a query, which is not limited in the present invention.
The present invention also provides another embodiment of the query method, which is shown in fig. 7 and may include the following steps S710 to S770:
step S710, obtaining a data query request, where the data query request includes at least two data source query conditions, and the data source query conditions are in a relationship with each other, which is the same as step S110.
Step S720, obtaining the number of data sources corresponding to each data source query condition in the time series database based on the index information of the time series database, the step is the same as the step S120.
Step S730, when the minimum value of the data source number is greater than the preset threshold, searching the data sources meeting the query conditions of the data sources in the time sequence database, respectively, to obtain a plurality of data source sets. The preset threshold is a parameter set according to experience, actual requirements, and the like, and for example, a variable minseries id card index related to the preset threshold may be set to measure whether the number of data sources corresponding to the data source query condition is large. When the minimum value in the number of the data sources is greater than the preset threshold value, it is indicated that each data source query condition corresponds to more data sources, and thus, the obtained data source set is larger by searching through any one of the query conditions, so that a mode of respectively searching the data source sets corresponding to each data source query condition and then obtaining the target data source by taking the intersection can be adopted.
Step S740, combining the intersections of all the searched data source sets to obtain a data source subset, and using the data sources in the data source subset as target data sources. In the subsequent process of step S730, each data source query condition is regarded as an equivalent condition (i.e., master and slave query conditions are not distinguished), and after the data source sets are respectively found, intersection is performed to obtain a data source subset, where the data source is a target data source corresponding to the time sequence data to be found, and steps S730 and S740 are called an equivalent condition query mode.
Step S750, when it is determined that the minimum value of the data source numbers is smaller than the preset threshold, determining that the data source query condition with the minimum data source number is the master query condition, and the remaining data source query conditions are the slave query conditions, which is the same as the implementation manner of determining the master-slave query condition in step S120.
Step S760, searching a data source meeting the main query condition in the time sequence database to obtain a data source set, and searching a target data source meeting the secondary query condition in the data source set, which is the same as step S130. Namely, steps S750 and S760 are equivalent to the non-equivalent condition (i.e., the condition for distinguishing the master and slave queries) query manner of steps S120 and S130.
Step S770, finding the corresponding time series data according to the target data source, which is the same as step S140, and the final way of finding the time series data is the same no matter whether the target data source is obtained by an equivalent condition query way or a non-equivalent condition query way.
By the way of fig. 7, the query process of the time series data is actually divided into two ways: the equivalent condition query mode and the non-equivalent condition query mode can select which mode is specifically adopted according to the number of the data sources, so that the flexibility of data query is improved.
It should be added that, for the case that the minimum value of the number of data sources is equal to the preset threshold, it may be regarded as the special case of step S730, and the processing of steps S730 and S740 is executed, or it may be regarded as the special case of step S750, and the processing of steps S750 and S760 is executed, which is not limited by the present invention.
The technical effect of the present exemplary embodiment is explained below by a specific example. In infiluxdb, data is queried by using the native query method in the related art, and the results are as follows:
>select count(rt)from m0_xxx where env='online'and user='AAA'and time>'2019-02-02T10:00:00Z'and time<'2019-02-02T10:20:00Z'
name:m0_xxx
time count
---- -----
1549101600000000001 40
name:Query Execution
rows time
---- ----
1 39.57ms
the method in the present exemplary embodiment is used to query data, and the results are as follows:
>select count(rt)from m0_xxx where env='online'and user='AAA'and time>'2019-02-02T10:00:00Z'and time<'2019-02-02T10:20:00Z'
name:m0_xxx
time count
---- -----
1549101600000000001 40
name:Query Execution
rows time
---- ----
1 1.58389ms
it can be seen that, after the data query flow is optimized by the exemplary embodiment, the time for querying the same data is reduced from 39.5ms to 1.5ms, and the query efficiency is greatly improved.
Exemplary devices
Having described the method of the exemplary embodiment of the present invention, the apparatus of the exemplary embodiment of the present invention will next be described with reference to fig. 8.
As shown in fig. 8, the apparatus 800 for querying time series data may include: a request obtaining module 810, configured to obtain a data query request, where the data query request includes at least two data source query conditions, and each data source query condition is in a relationship with each other; a master-slave determining module 820, configured to obtain, based on the index information of the time sequence database, the number of data sources corresponding to each data source query condition in the time sequence database, and determine that the data source query condition with the smallest number of data sources is a master query condition, and the remaining data source query conditions are slave query conditions; the data source searching module 830 is configured to search a data source meeting the main query condition in the time sequence database to obtain a data source set, and search a target data source meeting the secondary query condition in the data source set; and the time sequence data searching module 840 is used for searching corresponding time sequence data according to the target data source.
In an optional implementation manner, the master-slave determining module 820 may be configured to determine, when it is determined that the minimum value of the data source numbers is smaller than a preset threshold, that the data source query condition with the minimum data source number is the master query condition, and the remaining data source query conditions are the slave query conditions; the data source searching module 830 may be further configured to, when the master-slave determining module 820 determines that the minimum value of the number of the data sources is greater than a preset threshold, respectively search the data sources meeting the query conditions of the data sources in the time sequence database to obtain a plurality of data source sets, and obtain an intersection from all the searched data source sets to obtain a data source subset; and the time sequence data searching module is also used for searching corresponding time sequence data according to the data sources in the data source subset.
In an alternative embodiment, the data source query condition may include a target dimension value, where different data source query conditions correspond to different dimensions.
In an alternative embodiment, the time series database has a preconfigured index file; the master-slave determination module 820 may include: the storage area searching unit is used for respectively searching the storage areas where the target dimension values in the data source query conditions are located in the index file and acquiring the number of the data sources corresponding to the target dimension values from the storage areas; and the data source counting unit is used for counting the number of the data sources corresponding to the target dimension values in the data source query conditions to obtain the number of the data sources corresponding to the data source query conditions.
In an alternative embodiment, the data source searching module 830 may include: the main query unit is used for searching a data source corresponding to the target dimension value in the main query condition in the time sequence database to obtain a data source set; and the slave query unit is used for screening the data sources in the data source set according to the target dimension values in the slave query conditions to obtain the target data sources.
In an optional embodiment, the data query request may further include a time period to be queried; the time series data searching module 840 may also be configured to search time series data corresponding to the target data source and located in the time period to be queried.
In an optional implementation manner, the data query request may further include a data index corresponding to the time series data to be queried; the time series data searching module 840 may also be configured to search, under the data index, time series data corresponding to the target data source and located in the time period to be queried.
In addition, other specific details of the embodiments of the present invention have been described in detail in the embodiments of the present invention of the above method, and are not described herein again.
Exemplary storage Medium
A storage medium of an exemplary embodiment of the present invention is explained with reference to fig. 9.
As shown in fig. 9, a program product 900 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RE, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a local area network (FAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
Exemplary electronic device
An electronic device of an exemplary embodiment of the present invention is explained with reference to fig. 10.
The electronic device 1000 shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 10, the electronic device 1000 is embodied in the form of a general purpose computing device. The components of the electronic device 1000 may include, but are not limited to: at least one processing unit 1010, at least one memory unit 1020, a bus 1030 that couples various system components including the memory unit 1020 and the processing unit 1010, and a display unit 1040.
Where the storage unit stores program code that may be executed by the processing unit 1010 to cause the processing unit 1010 to perform the steps according to various exemplary embodiments of the present invention described in the "exemplary methods" section above in this specification. For example, the processing unit 1010 may perform the method steps shown in fig. 1, 5, 6, or 7, and so on.
The memory unit 1020 may include volatile memory units such as a random access memory unit (RAM)1021 and/or a cache memory unit 1022, and may further include a read only memory unit (ROM) 1023.
Storage unit 1020 may also include a program/utility 1024 having a set (at least one) of program modules 1025, such program modules 1025 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 1030 may include a data bus, an address bus, and a control bus.
The electronic device 1000 may also communicate with one or more external devices 1100 (e.g., keyboard, pointing device, bluetooth device, etc.), which may be through an input/output (I/O) interface 1050. The electronic device 1000 also includes a display unit 1040 connected to an input/output (I/O) interface 1050 for displaying. Also, the electronic device 1000 may communicate with one or more networks (e.g., a local area network (FAN), a Wide Area Network (WAN) and/or a public network, such as the Internet) via the network adapter 1060. As shown, the network adapter 1060 communicates with the other modules of the electronic device 1000 over the bus 1030. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 1000, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
It should be noted that although in the above detailed description several modules or sub-modules of the apparatus are mentioned, such division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module according to embodiments of the invention. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.
Moreover, while the operations of the method of the invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
While the spirit and principles of the invention have been described with reference to several particular embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (10)

1. A method for querying time series data is characterized by comprising the following steps:
acquiring a data query request, wherein the data query request comprises at least two data source query conditions, and the at least two data source query conditions are in a relation with each other;
acquiring the number of data sources corresponding to each data source query condition in a time sequence database based on index information of the time sequence database, and determining that the data source query condition with the least number of data sources is a main query condition and the rest of the data source query conditions are slave query conditions;
searching a data source meeting the main query condition in the time sequence database to obtain a data source set, and searching a target data source meeting the auxiliary query condition in the data source set;
and searching corresponding time sequence data according to the target data source.
2. The method of claim 1, wherein after obtaining the number of data sources corresponding to each data source query condition in the time-series database, the method further comprises:
when the minimum value in the data source quantity is judged to be larger than a preset threshold value, searching data sources which accord with the data source query conditions in the time sequence database respectively to obtain a plurality of data source sets;
collecting intersection sets of all the searched data source sets to obtain a data source subset;
searching corresponding time sequence data according to the data sources in the data source subset;
and when the minimum value in the data source quantity is judged to be smaller than the preset threshold value, executing a step of determining that the data source query condition with the minimum data source quantity is a main query condition and the rest data source query conditions are auxiliary query conditions.
3. The method of claim 1, wherein the data source query terms comprise target dimension values, and wherein different data source query terms correspond to different dimensions.
4. The method of claim 3, wherein the time series database has a preconfigured index file;
the obtaining of the number of data sources corresponding to each data source query condition in the time sequence database based on the index information of the time sequence database includes:
respectively searching a storage area where a target dimension value in each data source query condition is located in the index file, and acquiring the number of data sources corresponding to each target dimension value from the storage area;
and counting the number of data sources corresponding to the target dimension value in each data source query condition to obtain the number of data sources corresponding to each data source query condition.
5. The method according to claim 4, wherein the searching the time-series database for the data source meeting the master query condition to obtain a data source set, and the searching the data source set for the target data source meeting the slave query condition comprises:
searching a data source corresponding to the target dimension value in the main query condition in the time sequence database to obtain the data source set;
and screening the data sources in the data source set according to the target dimension values in the query conditions to obtain the target data sources.
6. The method of claim 1, wherein the data query request further comprises a time period to be queried;
the searching for the corresponding time sequence data according to the target data source comprises:
and searching time sequence data which corresponds to the target data source and is in the time period to be inquired.
7. The method according to claim 6, wherein the data query request further includes a data index corresponding to the time series data to be queried;
the searching for the time sequence data corresponding to the target data source and in the time period to be queried includes:
and searching time sequence data which corresponds to the target data source and is in the time period to be inquired under the data index.
8. An apparatus for querying time series data, comprising:
the request acquisition module is used for acquiring a data query request, wherein the data query request comprises at least two data source query conditions, and the at least two data source query conditions are in a relation with each other;
the master-slave determining module is used for acquiring the number of data sources corresponding to each data source query condition in the time sequence database based on the index information of the time sequence database, determining the data source query condition with the least number of data sources as a master query condition, and determining the rest data source query conditions as slave query conditions;
the data source searching module is used for searching a data source which accords with the main query condition in the time sequence database to obtain a data source set, and searching a target data source which accords with the auxiliary query condition in the data source set;
and the time sequence data searching module is used for searching corresponding time sequence data according to the target data source.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 7.
10. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the method of any of claims 1 to 7 via execution of the executable instructions.
CN201910912865.8A 2019-09-25 2019-09-25 Time sequence data query method and device, storage medium and electronic equipment Pending CN110619000A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910912865.8A CN110619000A (en) 2019-09-25 2019-09-25 Time sequence data query method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910912865.8A CN110619000A (en) 2019-09-25 2019-09-25 Time sequence data query method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN110619000A true CN110619000A (en) 2019-12-27

Family

ID=68924643

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910912865.8A Pending CN110619000A (en) 2019-09-25 2019-09-25 Time sequence data query method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN110619000A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541009A (en) * 2020-12-23 2021-03-23 平安普惠企业管理有限公司 Data query method and device, electronic equipment and storage medium
CN112800061A (en) * 2021-01-29 2021-05-14 北京锐安科技有限公司 Data storage method, device, server and storage medium
CN113127722A (en) * 2019-12-31 2021-07-16 新奥数能科技有限公司 Data query method and device, readable medium and electronic equipment
CN113312313A (en) * 2021-01-29 2021-08-27 淘宝(中国)软件有限公司 Data query method, nonvolatile storage medium and electronic device
CN113535770A (en) * 2020-04-22 2021-10-22 杭州海康威视数字技术股份有限公司 Data query method and device
CN113535781A (en) * 2021-07-21 2021-10-22 北京锐安科技有限公司 Data query method, device, equipment and storage medium of time sequence library

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104111955A (en) * 2013-04-22 2014-10-22 ***股份有限公司 Combined inquiring method oriented to Hbase database
CN106446242A (en) * 2016-10-12 2017-02-22 太原理工大学 Efficient multi-keyword matchingoptimal pathquery method
US20170116246A1 (en) * 2015-10-21 2017-04-27 International Business Machines Corporation Index management
CN106933893A (en) * 2015-12-31 2017-07-07 北京国双科技有限公司 The querying method and device of multi-dimensional data
CN109213921A (en) * 2017-06-29 2019-01-15 广州涌智信息科技有限公司 A kind of searching method and device of merchandise news

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104111955A (en) * 2013-04-22 2014-10-22 ***股份有限公司 Combined inquiring method oriented to Hbase database
US20170116246A1 (en) * 2015-10-21 2017-04-27 International Business Machines Corporation Index management
CN106933893A (en) * 2015-12-31 2017-07-07 北京国双科技有限公司 The querying method and device of multi-dimensional data
CN106446242A (en) * 2016-10-12 2017-02-22 太原理工大学 Efficient multi-keyword matchingoptimal pathquery method
CN109213921A (en) * 2017-06-29 2019-01-15 广州涌智信息科技有限公司 A kind of searching method and device of merchandise news

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
大雄: "《https://www.cnblogs.com/daxiongblog/p/4350583.html》", 19 March 2015 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113127722A (en) * 2019-12-31 2021-07-16 新奥数能科技有限公司 Data query method and device, readable medium and electronic equipment
CN113535770A (en) * 2020-04-22 2021-10-22 杭州海康威视数字技术股份有限公司 Data query method and device
CN112541009A (en) * 2020-12-23 2021-03-23 平安普惠企业管理有限公司 Data query method and device, electronic equipment and storage medium
CN112541009B (en) * 2020-12-23 2023-10-13 湖北华中电力科技开发有限责任公司 Data query method, device, electronic equipment and storage medium
CN112800061A (en) * 2021-01-29 2021-05-14 北京锐安科技有限公司 Data storage method, device, server and storage medium
CN113312313A (en) * 2021-01-29 2021-08-27 淘宝(中国)软件有限公司 Data query method, nonvolatile storage medium and electronic device
CN113312313B (en) * 2021-01-29 2023-09-29 淘宝(中国)软件有限公司 Data query method, nonvolatile storage medium and electronic device
CN112800061B (en) * 2021-01-29 2024-05-10 北京锐安科技有限公司 Data storage method, device, server and storage medium
CN113535781A (en) * 2021-07-21 2021-10-22 北京锐安科技有限公司 Data query method, device, equipment and storage medium of time sequence library
CN113535781B (en) * 2021-07-21 2024-05-10 北京锐安科技有限公司 Data query method, device and equipment of time sequence library and storage medium

Similar Documents

Publication Publication Date Title
CN110619000A (en) Time sequence data query method and device, storage medium and electronic equipment
US8862566B2 (en) Systems and methods for intelligent parallel searching
CN107357902B (en) Data table classification system and method based on association rule
US9626081B2 (en) System for classification code selection
CN107729376B (en) Insurance data auditing method and device, computer equipment and storage medium
CN104769586A (en) Profiling data with location information
US10055452B2 (en) Most likely classification code
CN111125116B (en) Method and system for positioning code field in service table and corresponding code table
US20150269138A1 (en) Publication Scope Visualization and Analysis
CN112527783A (en) Data quality probing system based on Hadoop
CN103646049A (en) Method and system for automatically generating data report
US20180121526A1 (en) Method, apparatus, and computer-readable medium for non-structured data profiling
JP5506527B2 (en) Synonymous column detection device and synonymous column detection method
CN110580253B (en) Time sequence data set loading method and device, storage medium and electronic equipment
CN113760891A (en) Data table generation method, device, equipment and storage medium
CN110399396B (en) Efficient data processing
CN111949845A (en) Method, apparatus, computer device and storage medium for processing mapping information
JP6771503B2 (en) Data management system and related data recommendation method
CN116303571A (en) Data query method, device, equipment and storage medium
US20220019597A1 (en) Data management device and data management method
CN113496365A (en) Method, device, equipment and medium for determining warehouse merging scheme
CN106649880B (en) Power statistics management system and method
JPWO2009008129A1 (en) Development document data management apparatus, development document data management system, development document data management method, program thereof, and storage medium
JP2004192657A (en) Information retrieval system, and recording medium recording information retrieval method and program for information retrieval
WO2016013099A1 (en) Feature data management system and feature data management method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination