CN113297135A - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN113297135A
CN113297135A CN202110182739.9A CN202110182739A CN113297135A CN 113297135 A CN113297135 A CN 113297135A CN 202110182739 A CN202110182739 A CN 202110182739A CN 113297135 A CN113297135 A CN 113297135A
Authority
CN
China
Prior art keywords
data
stored
partition
target object
index table
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110182739.9A
Other languages
Chinese (zh)
Inventor
吴兴博
胡建洪
张友东
杨成虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN202110182739.9A priority Critical patent/CN113297135A/en
Publication of CN113297135A publication Critical patent/CN113297135A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/144Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An embodiment of the present specification provides a data processing method and an apparatus, wherein the data processing method includes: receiving a data storage request aiming at data to be stored of a target object, wherein the data storage request carries attribute information of the data to be stored and an object tag of the target object; determining a creation timestamp and a latest timestamp of the data to be stored based on attribute information of the data to be stored, determining a data partitioning strategy based on the creation timestamp and the latest timestamp, and creating a data partition according to the data partitioning strategy; determining data to be stored of the target object in each data partition, and determining index data corresponding to the data to be stored; and constructing a first index table based on the data partitions corresponding to the index data, the target objects corresponding to the data partitions and the object tags of the target objects, and performing data storage on the data to be stored of each data partition based on the first index table.

Description

Data processing method and device
Technical Field
The embodiment of the specification relates to the technical field of computers, in particular to a data processing method. One or more embodiments of the present specification also relate to a data processing apparatus, a computing device, and a computer-readable storage medium.
Background
As the application of the internet or the internet of things is more and more widespread, a detection system in the internet or an internet of things device and the like generate more and more data based on time series, and the data is called as time series data. Although the sources of the time series data are different, as the version of the acquisition equipment is upgraded, the system is upgraded or the acquisition object disappears due to various reasons, new timelines is always created and old timelines is gradually eliminated, the time series data storage efficiency is low along with the continuous increase of the data volume, inconvenience is brought to subsequent query, and the query efficiency is reduced.
Disclosure of Invention
In view of this, the present specification provides a data processing method. One or more embodiments of the present specification also relate to a data processing apparatus, a computing device, and a computer-readable storage medium to address technical deficiencies in the prior art.
According to a first aspect of embodiments herein, there is provided a data processing method including:
receiving a data storage request aiming at data to be stored of a target object, wherein the data storage request carries attribute information of the data to be stored and an object tag of the target object;
determining a creation timestamp and a latest timestamp of the data to be stored based on attribute information of the data to be stored, determining a data partitioning strategy based on the creation timestamp and the latest timestamp, and creating a data partition according to the data partitioning strategy;
determining data to be stored of the target object in each data partition, and determining index data corresponding to the data to be stored;
and constructing a first index table based on the data partitions corresponding to the index data, the target objects corresponding to the data partitions and the object tags of the target objects, and performing data storage on the data to be stored of each data partition based on the first index table.
According to a second aspect of embodiments herein, there is provided a data processing apparatus comprising:
the data storage system comprises a receiving module, a storage module and a processing module, wherein the receiving module is configured to receive a data storage request of data to be stored aiming at a target object, and the data storage request carries attribute information of the data to be stored and an object tag of the target object;
the creating module is configured to determine a creating time stamp and a latest time stamp of the data to be stored based on attribute information of the data to be stored, determine a data partitioning strategy based on the creating time stamp and the latest time stamp, and create a data partition according to the data partitioning strategy;
the determining module is configured to determine to-be-stored data of the target object in each data partition and determine index data corresponding to the to-be-stored data;
the storage module is configured to construct a first index table based on the data partitions corresponding to the index data, the target objects corresponding to the data partitions and the object tags of the target objects, and perform data storage on the data to be stored of each data partition based on the first index table.
According to a third aspect of embodiments herein, there is provided a computing device comprising:
a memory and a processor;
the memory is for storing computer-executable instructions, and the processor is for implementing the steps of the data processing method when executing the computer-executable instructions.
According to a fourth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of any one of the data processing methods.
In one embodiment of the present specification, a creation timestamp and a latest timestamp of data to be stored are determined according to attribute information of the data to be stored, a data partitioning policy is further determined, a data partition is created, a first index table with a target object and an object tag is constructed based on the data partition, so that a time attribute is added to an index, data meeting a query condition is screened for a subsequent supportable time dimension, and the data to be stored is stored based on the first index table.
Drawings
Fig. 1 is a schematic time-series diagram of time-series data of a data processing method according to an embodiment of the present disclosure;
FIG. 2 is a flow chart of a data processing method provided by an embodiment of the present specification;
FIG. 3 is a schematic diagram of data partitioning for a data processing method according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a data merge store of a data processing method according to an embodiment of the present disclosure;
FIG. 5 is a flowchart of a data processing method applied to query data according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present specification;
fig. 7 is a block diagram of a computing device according to an embodiment of the present disclosure.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.
The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
First, the noun terms to which one or more embodiments of the present specification relate are explained.
Time series database TSDB: the English is called Time Series Database, and provides a data management system with efficient Time sequence data access and statistical analysis functions.
Time Series Data (Time Series Data): data is monitored based on a series of indicators that are continuously generated at a stable frequency. For example, when monitoring the air quality in a city, a series of data is generated by collecting a value of sulfur dioxide concentration every second.
Metric (Metric): although an index item to be monitored is specified, the index item of what object is to be monitored is not specified, and indexes of data such as wind power and temperature are monitored.
Label (Tag): the specific object used for indicating the index item monitoring aims at belongs to the data subcategory under the specified measurement.
Timestamp (Timestamp): time point of data (metric value) generation.
Time line (Timeline): equivalent to the concept of time series.
Time partition Index (Time Partitioned Index): the life cycle of the timeline is recorded by the time segment index timeline.
Write-ahead log (WAL): a way of efficiently recording data.
Bitmap: a bitmap, a data structure, represents a dense set in a finite field, with each element appearing at least once, with no other data associated with the element.
Bloom filter: a binary vector data structure, which is space and time efficient, is used to detect whether an element is a member of a set.
A Tag (Tag) is composed of a Tag key (Tag key) and a corresponding Tag value (Tag value), for example, "city (Tag key) ═ hangzhou (Tag value)" is a Tag (Tag). More tag examples: the machine room is A, IP 172.220.110.1.
It should be noted that the same tag is calculated when the tag key and the tag value are the same; the label keys are the same, and the label values are different, so that the labels are not the same. When the data is monitored, the designated measurement is the air temperature, and the label is the air temperature in Hangzhou city.
Tag key (TagKey, Tagk): the specified object type (having a corresponding tag value to locate a specific object under the object type) is monitored for the Metric item (Metric), such as country, province, city, machine room, IP, etc.
Tag value (TagValue, Tagv): the value corresponding to the tag key (TagKey). For example, when the tag key (TagKey) is "country", the tag value (TagValue) may be designated as "china".
Value (Value): corresponding values are measured, for example 15 levels (wind) and 20 ℃ (temperature).
Data points (Data Point): each metric value collected at a particular time interval (successive time stamps) for a certain metric of the monitored subject (defined by the metric and the tag) is a data point. A "metric + N tags (N > ═ 1) + a timestamp + a value" defines a data point.
Time Series (Time Series): description of a certain index (defined by metrics and tags) for a certain monitored object. The "one metric + N tag KV combination (N > ═ 1)" is defined as a time series, and an increase in data value generated in a certain time series does not result in an increase in the time series.
Referring to fig. 1, fig. 1 shows a schematic time-series diagram of time-series data of a data processing method provided by an embodiment of the present specification, in fig. 1, there are 5 data points, one time stamp corresponds to one time-series, taking the measurement in fig. 1 as an example of temperature, and the labels are: floor 33, conference room number 3302, equipment identification 7649501, according to the order of time stamp, its temperature value is 26 degrees, 25.8 degrees, 26.1 degrees, 26.3 degrees, 26.5 degrees respectively, for example, time stamp is 1492158910, its temperature value is 26.
In practical application, a general search engine may provide tag retrieval, but there is no time dimension, the tag retrieval is a basic function of a time sequence database, a typical time sequence processing flow is based on a core component TSI, the TSI endows the time sequence database with an indexing capability for a time line, and the time sequence processing flow mainly includes time line management and cascade analysis, for example, a tag and a metric name are queried according to prefix matching and the like, and time line metadata information is queried according to a screening condition, including complete tag and metric information and the like.
The data query can query a timeline ID set according to the screening condition of the user, further query time sequence data corresponding to the timeline, and query the cascade relation of Tags, query metadata distribution and the like according to the condition.
The time sequence data is mainly oriented to the problems of continuous data updating and the application scene of the internet of things, new time lines are always created and old time lines are gradually eliminated due to the reasons that the version of the acquisition equipment, the system and the acquisition object appear in the data acquisition process and disappear due to various reasons, and the like.
In the present specification, a data processing method is provided, and the present specification relates to a data processing apparatus, a computing device, and a computer-readable storage medium, which are described in detail one by one in the following embodiments.
Fig. 2 is a flowchart illustrating a data processing method according to an embodiment of the present specification, which specifically includes the following steps.
Step 202: receiving a data storage request aiming at data to be stored of a target object, wherein the data storage request carries attribute information of the data to be stored and an object tag of the target object.
The data storage request may be understood as a storage request for data to be stored of the target object, for example, if the data to be stored is temperature data, the data storage request may be a storage request for storing the acquired temperature data.
The attribute information of the data to be stored can be understood as the measurement of the data to be stored and the recording time of the data to be stored; the object tag of the target object may be understood as an object tag corresponding to a target object for which the data to be stored is specific, for example, if the data to be stored is 5 degrees, the attribute information of the data to be stored is temperature, and the recording time of the data to be stored is 10 a.m.: 01.
specifically, the server receives a data storage request for data to be stored of a target object, where the data storage request carries attribute information of the data to be stored and an object tag of the target object, for example, receives a data storage request for meteorological data of a conference room, where the data storage request carries measurement information of temperature and humidity of the meteorological data and identification information of the conference room, such as layer 1 conference room 1.
Step 204: the method comprises the steps of determining a creating time stamp and a latest time stamp of data to be stored based on attribute information of the data to be stored, determining a data partitioning strategy based on the creating time stamp and the latest time stamp, and creating a data partition according to the data partitioning strategy.
The data partitioning policy may be understood as a policy for partitioning a timeline lifecycle at certain time intervals, and the data partitioning may be understood as a time partitioning for partitioning the timeline lifecycle at certain time intervals.
Specifically, after receiving the attribute information of the data to be stored, the creating timestamp and the latest timestamp of the data to be stored may be determined based on the attribute information, a corresponding data partition policy is determined by writing the creating timestamp and the latest timestamp of the stored data, and the data partition is created according to the data partition policy.
In practical application, when data is written and stored, a new time line marks a creation time stamp and a latest access time stamp in a memory, the memory time stamp is updated each time the time line is written, the creation time stamp is subtracted from the latest access time stamp, the total time of writing a target object can be obtained, a strategy of time partition is determined according to the total time of writing data, and a data partition is created according to the strategy of time partition.
For example, the server records meeting room weather data of building 1, when the weather data is written, the meeting mark creation timestamp is 1 month, 1 day, 00:00 in 2020, the timestamp when the data is written is updated with each writing of the time series data, when the time series data is recorded in one month, the time of one month is determined to be sliced according to the actual requirement, the data partitioning strategy is that ten days are taken as one slice, the first time slice of the weather data of the meeting room of building 1 is 1 month, 1 day, 00, 1 month, 2020, 1 day, 10 months, 24:00 in 2020, 1 month, 00, 2020, 1 month, 20 months, 24:00 in 2020, the second time slice is 11 months, 00, 2020, 1 month, 20 months, 00 in 2020, and the third time slice is 21 months, 00, 2020, 1 month, 30 months, 24: 00.
Further, the determining the creation timestamp and the latest timestamp of the data to be stored based on the attribute information of the data to be stored includes:
and acquiring a creation timestamp and a latest timestamp of the data to be stored based on the attribute information of the data to be stored, and writing the creation timestamp and the latest timestamp into a log file for storage.
Specifically, after receiving the attribute information of the data to be stored for the target object, recording a creation timestamp and a latest writing timestamp written by the data to be stored according to the attribute information of the data to be stored, and writing the marked creation timestamp and the latest timestamp into a log file for storage.
In practical application, the timestamp for writing the data to be stored can be recorded in a log pre-writing mode, so that the reliability of data writing can be guaranteed according to the writing record for searching the data to be stored subsequently.
According to the data processing method provided by the embodiment of the specification, the creation timestamp and the latest writing timestamp of the data to be stored are written into the log file, so that not only is the safety of the data guaranteed, but also the data partitioning strategy is determined according to the timestamp data for data storage in the following process.
Referring to fig. 3, fig. 3 is a schematic diagram illustrating data partitioning of a data processing method according to an embodiment of the present specification.
In fig. 3, a horizontal axis represents a time line life cycle, a vertical axis represents a plurality of time lines, in fig. 3, the time line life cycle is divided into four time slices, namely time slice 1, time slice 2, time slice 3 and time slice 4, and in fig. 3, the time line life cycle is divided into six time lines, namely time line 1, time line 2, time line 3, time line 4, time line 5 and time line 6, it should be noted that one time line may represent one target object, such as a conference room; in fig. 3, the time length of the time sequence data recorded in each time line is different, that is, the creating time stamp and the latest time stamp of the record are different, in each time slice, a corresponding time line falling in the time slice can be found, which indicates that the time slice has the stored data of the corresponding time line, for example, time slice 1 has time sequence data corresponding to time line 1, time line 2 and time line 3, time slice 2 has time sequence data corresponding to time line 2, time line 3, time line 4 and time line 5, and so on.
In practical application, the time slices are determined to be sliced according to the preset time interval according to the creation timestamp and the latest access timestamp of the stored time sequence data, the application scene that massive timelines are continuously created and lost is met, each timeline is a conference room along the use example, and therefore when recording the meteorological data of the conference rooms, 1 layer of meteorological data of the conference rooms 1, 2 and 3 are recorded in the time slice 1.
In the data processing method provided in the embodiment of the present description, in a scenario where a massive timeline is continuously created and faded, the index data is partitioned according to the time dimension, so that the index data can be subsequently filtered through the time dimension, the index data meeting the requirement condition is screened out, the data to be stored is further queried through the index data more quickly, and the efficiency of querying the data is improved.
Step 206: and determining the data to be stored of the target object in each data partition, and determining the index data corresponding to the data to be stored.
The index data may be understood as data with a data tag in the data to be stored.
Specifically, when data writing is performed, the data to be stored for the target object in each data partition is determined, and the index data corresponding to the data to be stored is determined according to the data to be stored of the target object.
In practical application, the index data is stored to facilitate subsequent search of the stored data, so that the data processing method provided by this embodiment segments the index data according to the time dimension when storing the data, adds a time attribute to the index, supports time dimension filtering, and facilitates subsequent search of the stored data by the policy of the index data.
By using the above example, the weather data of the conference room 1, the conference room 2 and the conference room 3 is determined in the time slice 1 (the data partition 1), and the index data of the weather data may include the tag of the target object, the metric information of the weather data, or the storage location where the weather data is stored, and the like, which is not limited in this specification.
Step 208: and constructing a first index table based on the data partitions corresponding to the index data, the target objects corresponding to the data partitions and the object tags of the target objects, and performing data storage on the data to be stored of each data partition based on the first index table.
The first index table may be understood as an inverted index table with data partitions, target objects, and object tags of the target objects.
Specifically, after the index data of the data to be stored is determined, the corresponding data partition is determined based on the index data, the target object corresponding to the time slice and the object tag corresponding to the target object can be determined in the data partition, a first index table is further constructed, and the data to be stored of each data partition is stored based on the constructed first index table.
In the above example, the time slice 1 (data partition 1) includes the weather data of the conference room 1, the conference room 2, and the conference room 3, that is, the first index table is constructed to include the target objects, the data partition, and the object tags corresponding to the target objects, where the target objects are the conference room 1, the conference room 2, and the conference room 3, the data partition is the time slice 1, and the object tags corresponding to the target objects are the layer 1 conference room 1, the layer 1 conference room 2, and the layer 2 conference room 3.
After the first index table is created, a second index table is also needed to be created to further obtain the storage offset of the data to be stored, and the data to be stored is stored based on the storage offset; specifically, the data storage of the data to be stored in each data partition based on the first index table includes:
constructing a second index table based on a target object corresponding to the data partition in the first index table, an object tag of the target object and attribute information of data to be stored of the target object;
acquiring a storage offset bit of data to be stored of each data partition based on the first index table and the second index table;
and storing the data to be stored according to the storage offset bit.
The second index table may be understood as a forward index table with the target objects corresponding to the data partitions in the first index table, the target object tags, and the attribute information of the data to be stored.
Specifically, when the data of each timeline is written, each data partition can determine the timeline falling at the time of the data partition, determine the target object corresponding to the timeline, the object tag of the target object, and the attribute information of the data to be stored of the target object, construct a second index table, determine the storage offset of the data to be stored of each target object based on the first index table and the second index table, and store each data to be stored according to the determined storage offset.
It should be noted that the record of the second index table is a part of generating the storage offset bit, where there are various ways to determine the storage offset bit, such as a combination way, a self-increment identification way, a file block mapping way, and the like, and the storage offset bit is determined in the combination way, and the storage offset bit is generated by splicing and combining the first index table and the second index table.
Along with the above example, each column in the second index table in the time slice 1 includes a target object, a tag, and data of attribute information of data to be stored, where the tag includes 1 layer, and 2 layers; the target objects comprise a conference room 1, a conference room 2 and a conference room 3; the attribute information comprises temperature, humidity, temperature and humidity; in the second index table, the horizontal arrangement is 1 layer-meeting room 1-temperature, 1 layer-meeting room 2-humidity, 2 layer-meeting room 3-temperature and humidity, it should be noted that each timeline in the table has a storage offset for storing corresponding data to be stored, and the data to be stored is stored.
In the data processing method provided in the embodiment of the present specification, the second index table is constructed based on the information of the first index table, so as to obtain the storage offset of the data to be stored in each data partition, and thus the data to be stored is stored quickly.
Further, after the storing the data to be stored according to the storage offset bit, the method further includes:
merging the first index table according to each data partition to obtain a merged inverted index file, and storing the merged inverted index file; and
and merging the second index table according to each data partition to obtain a merged front-row index file, and storing the merged front-row index file.
Specifically, after time dimensions are fragmented, index data of each time fragment can be merged into one storage file for storage, the first index table is merged according to each data partition, a merged inverted index file is obtained, and the merged index file is stored; and in the created second index table, dividing and respectively merging the data partitions to obtain merged index files in the front row, and storing the merged index files.
It should be noted that, when merging the index data files, the time attribute of the merged files is updated, and old files are removed from the file manager, so as to save storage space, where the query logic after merging the files is the same as the query logic before merging, and only the time line created in a small range is merged into a larger time period range based on the time window.
According to the data processing method provided by the embodiment of the specification, after the plurality of index fragments are merged and stored, the index data can be queried through querying the merged index file subsequently, so that the storage space is saved, and the long-period historical query efficiency is improved.
Referring to fig. 4, fig. 4 is a schematic diagram illustrating a data merge storage of a data processing method according to an embodiment of the present disclosure.
In fig. 4, through the process of writing the record data in the log, after the data is written, the index data of the data to be stored is stored in the timeline index table, and the data to be stored is stored in the timeline table, it should be noted that the timeline index table does not store real data, but only stores only index data, but stores a full amount of data in the timeline table, and has a large amount of real stored data, in comparison, the storage space occupied in the timeline table is much larger than that occupied by the timeline index table, each time slice corresponds to one timeline index file, the merge manager merges and stores a plurality of timeline files, wherein the timeline index file format uses hash index and the like to construct an inverted arrangement to support efficient point query and range query, and merge and store management are also performed for the timeline files, the timeline file realizes high-performance query by using data structures such as Bitmap and BloomFilter, performs statistical analysis on the timeline, and realizes adaptive query optimization.
In the data processing method provided in the embodiment of the present specification, the index files of the same data partition are merged by the merging module, so that subsequent storage and search can be performed quickly.
In addition, the continuous updating and writing of the time sequence data can also be realized by continuously creating a data partition and storing the data to be stored in a new data partition; specifically, the data storage of the data to be stored in each data partition based on the first index table includes:
creating a new data partition based on the latest timestamp of the data to be stored and the data partition policy;
and updating a first index table corresponding to the new data partition according to the new data partition, and storing the data to be stored of the new data partition based on the first index table.
Specifically, in the process of continuously writing data, a new data partition may be continuously created according to the latest timestamp of the data to be stored and the data partition policy, the first index table may be updated according to the index data of the data to be stored of the target object recorded in the new data partition, and the data to be stored in the new data partition may be stored in the updated first index table.
In practical application, time series data are continuously written and stored, when the time stamp of newly written data cannot be divided in a data partition, a new data partition can be created according to a data partition strategy, index data of the data to be stored is recorded in a first index table in the new data partition, and the data to be stored is stored based on the newly recorded first index table.
In the embodiment of the description, the newly written data to be stored is subjected to data partitioning through the created data partition, and the data to be stored is stored, so that the data to be stored is conveniently and quickly inquired through the index data in the new data partition in the following process.
With the continuous writing of the time sequence data, under the condition that the time of the data partition is longer than the real-time dimension, the data to be stored in the initially created data partition can be deleted; specifically, after the data to be stored of each data partition is stored based on the first index table, the method further includes:
and deleting the data to be stored of the data partition based on a preset requirement.
The preset requirement can be understood as different user requirements according to actual application, for example, in the case that the time line period is long, the time period of the initial data partition is compared with the real-time period, the time of the initial data partition is 1 year ago, in the actual application, when the data partition with long history time is stored in the data partition, the data effectiveness is poor, the actual application rate is not high, and the data to be stored in the data partition with low application rate can be deleted.
According to the data processing method provided by the embodiment of the specification, the data to be stored of the data partition is deleted according to the preset requirement, so that the data with poor timeliness is deleted, and the storage space is saved.
Further, in a scenario where the timeline is frequently created and aged, index expansion is easily caused, so that a query amplification problem caused by an invalid timeline is caused, and data query efficiency is seriously affected, and the data processing method provided in the embodiment of the present specification further includes:
receiving a data query request, wherein the data query request carries attribute information of data to be queried;
determining a data partition of the data to be queried in the first index table based on the attribute information of the data to be queried, and determining a target object corresponding to the data partition and an object tag of the target object in the first index table based on the data partition;
and performing data query based on the target object and the object label of the target object.
Specifically, after a data query request is received, the data query request is analyzed to obtain attribute information of data to be queried, a data partition of the data to be queried is determined in a first index table based on the attribute information of the data to be queried, a target object corresponding to the data partition and an object tag corresponding to the target object are determined in the first index table according to the data partition, and then the data to be queried is queried according to the target object and the object tag of the target object.
According to the above example, the meteorological data at the query time t is received, the data partition in which the time t falls is determined in the partitions in the first index table, the time t is determined to be within the range of the data partition 1, the index data of the meteorological data at the query time t can be screened out to be stored in the file of the data partition 1, the data with the data partition 1 is determined in the first index table, the target objects in the data partition 1 are further determined to be the conference room 1, the conference room 2 and the conference room 3, the object tags of the target objects are respectively the conference room 1-1 layer, the conference room 2-1 layer and the conference room 3-2 layer, and the meteorological data at the time t is queried according to the determined target objects and the object tags of the target objects.
In the data processing method provided in the embodiment of the present specification, the corresponding data partition is determined in the first index table according to the attribute information in the data query request, and then the corresponding target object and the object identifier of the target object are determined by the first index table, so that the storage location of the data to be queried in the query data can be quickly located by the index table, the query data is queried, and the query efficiency is improved.
After determining the corresponding target object and the object identifier of the target object in the query data, determining whether the query request meets the data query condition in the second index table; specifically, the querying data based on the target object and the object tag of the target object includes:
judging whether attribute information matched with the attribute information of the data to be inquired exists in the second index table or not based on the target object and the object tag of the target object;
if yes, determining the storage offset of the data to be queried based on the target object, and querying the data to be queried based on the storage offset.
Specifically, after a target object corresponding to the query data request and an object tag of the target object are acquired, whether attribute information matched with the attribute information of the data to be queried exists in the second index table is judged, and if the attribute information exists, a storage offset of the data to be queried is determined based on the target object, so that data query is performed.
In practical application, there may be an invalid query request in the data query request, for example, a user writes and queries humidity data at the time of 1t in the layer 1 conference room, if it is determined in the second index table that the humidity data is not stored in the layer 1 conference room 1, the written data query request is an effective data query request, the data query is failed this time, and no data information can be returned to the user, if the user queries temperature data at the time of 1t in the layer 1 conference room, it is determined in the second index table that the layer 1 conference room 1 has stored temperature data, the written data query request is an effective data query request, and a storage offset of data to be queried can be obtained based on a target object determined in the second index table, so as to perform data query.
In the data processing method provided in the embodiment of the present specification, whether the written data query request is reasonable is determined in the second index table, and then the timeline is screened according to the time range, so that the expired timeline can be effectively excluded, further the screening of the index data is realized, and the data query efficiency is improved.
To sum up, in the embodiments of the present specification, a creation timestamp and a latest timestamp of data to be stored are determined according to attribute information of the data to be stored, a data partitioning policy is further determined and a data partition is created, a first index table with a target object and an object tag is constructed based on the data partition, so that a time attribute is added to an index, data meeting a query condition is screened for a subsequent supportable time dimension, and the data to be stored is stored based on the first index table.
Referring to fig. 5, fig. 5 shows an example of applying a data processing method provided in an embodiment of the present specification to query data for detailed description, and fig. 5 is a flowchart of a processing procedure of applying a data processing method provided in an embodiment of the present specification to query data.
It should be noted that, a start timestamp and a latest timestamp of the data to be stored are determined based on the attribute information of the data to be stored, and then a data partitioning policy is determined, in the data partitioning policy determined in this embodiment, a month time may be used as a time period of one data partition, for example, the determined data partitions are respectively 1 month 1 day of 2020 to 2 month 1 day of 2020, 2 month 1 day of 2020 to 3 month 1 day of 2020, and 3 month 1 day of 2020 to 4 month 1 day of 2020, and then the meteorological data of a meeting room in each data partition and the index data corresponding to the meteorological data to be stored are determined, based on which, the step of querying the meteorological data of the meeting room is as follows:
step 502: a query request is received to query temperature data for a conference room on day 1, month 10, 2020.
Step 504: and determining that the time of the temperature data to be inquired falls in the data partition 1 according to the time in the inquiry request, and determining that the meeting room 1, the meeting room 2 and the meeting room 3 have the weather data of 1 month and 10 days of 2020.
Step 506: it is determined in the first index table that the object identifier corresponding to conference room 1 is layer 1-conference room 1, the object identifier corresponding to conference room 2 is layer 1-conference room 2, and the object identifier corresponding to conference room 3 is layer 2-conference room 1.
Step 508: and determining that the layer 1 conference room 1 has a humidity label for storing temperature data, the layer 1 conference room 2 has a temperature label for storing temperature data, and the layer 2 conference room 1 has a humidity label for storing humidity data in a second index table according to the obtained target object and the target object identifier.
Step 510: and determining that only the data of the layer 1 conference room 2 meets the query condition by searching the attribute information of the temperature data in the data query request.
Step 512: and acquiring a storage offset of the data to be inquired of the layer 1-meeting room 2 in the second index table, and searching the temperature data of the layer 1-meeting room 2 stored for 1 month and 10 days of 2020 based on the storage offset.
In the data processing method provided in the embodiment of the present specification, an index scheme created by time partitioning is used, the first index table is used for screening, a query condition is determined in the second index table, and then an actual data storage location is accurately searched for in a data table storing full data to perform data query, so that data query efficiency is improved.
Corresponding to the above method embodiment, the present specification further provides an embodiment of a data processing apparatus, and fig. 6 shows a schematic structural diagram of a data processing apparatus provided in an embodiment of the present specification. As shown in fig. 6, the apparatus includes:
a receiving module 602, configured to receive a data storage request for data to be stored of a target object, where the data storage request carries attribute information of the data to be stored and an object tag of the target object;
a creating module 604 configured to determine a creating timestamp and a latest timestamp of data to be stored based on attribute information of the data to be stored, determine a data partitioning policy based on the creating timestamp and the latest timestamp, and create a data partition according to the data partitioning policy;
a determining module 606 configured to determine to-be-stored data of the target object in each data partition, and determine index data corresponding to the to-be-stored data;
the storage module 608 is configured to construct a first index table based on the data partition corresponding to the index data, the target object corresponding to the data partition, and the object tag of the target object, and perform data storage on the data to be stored of each data partition based on the first index table.
Optionally, the storage module 608 is further configured to:
constructing a second index table based on a target object corresponding to the data partition in the first index table, an object tag of the target object and attribute information of data to be stored of the target object;
acquiring a storage offset bit of data to be stored of each data partition based on the second index table;
and storing the data to be stored according to the storage offset bit.
Optionally, the data processing apparatus further includes:
receiving a data query request, wherein the data query request carries attribute information of data to be queried;
determining a data partition of the data to be queried in the first index table based on the attribute information of the data to be queried, and determining a target object corresponding to the data partition and an object tag of the target object in the first index table based on the data partition;
and performing data query based on the target object and the object label of the target object.
Optionally, the data processing apparatus further includes:
judging whether attribute information matched with the attribute information of the data to be inquired exists in the second index table or not based on the target object and the object tag of the target object;
if yes, determining the storage offset of the data to be queried based on the target object, and querying the data to be queried based on the storage offset.
Optionally, the data processing apparatus further includes:
merging the first index table according to each data partition to obtain a merged inverted index file, and storing the merged inverted index file; and
and merging the second index table according to each data partition to obtain a merged front-row index file, and storing the merged front-row index file.
Optionally, the creating module 604 is further configured to:
and acquiring a creation timestamp and a latest timestamp of the data to be stored based on the attribute information of the data to be stored, and writing the creation timestamp and the latest timestamp into a log file for storage.
Optionally, the storage module 608 is further configured to:
creating a new data partition based on the latest timestamp of the data to be stored and the data partition policy;
and updating a first index table corresponding to the new data partition according to the new data partition, and storing the data to be stored of the new data partition based on the first index table.
Optionally, the data processing apparatus further includes:
and deleting the data to be stored of the data partition based on a preset requirement.
The data processing device provided by the embodiment of the specification determines the creation time stamp and the latest time stamp of the data to be stored through the attribute information of the data to be stored, further determining a data partitioning policy and creating a data partition, building a first index table with target objects and object tags based on the data partition, so that the index is added with time attribute, and the data meeting the query condition is screened for the subsequent supportable time dimension, and the data to be stored is stored based on the first index table, so that under the scene that massive timelines is continuously created and lost, the index data is partitioned according to the time dimension, thereby avoiding the loss of the index data, improving the safety of data storage, the method also improves the storage efficiency of the data to be stored by providing an effective index strategy, and can subsequently improve the query efficiency of the data.
The above is a schematic configuration of a data processing apparatus of the present embodiment. It should be noted that the technical solution of the data processing apparatus and the technical solution of the data processing method belong to the same concept, and details that are not described in detail in the technical solution of the data processing apparatus can be referred to the description of the technical solution of the data processing method.
FIG. 7 illustrates a block diagram of a computing device 700 provided in accordance with one embodiment of the present description. The components of the computing device 700 include, but are not limited to, memory 710 and a processor 720. Processor 720 is coupled to memory 710 via bus 730, and database 750 is used to store data.
Computing device 700 also includes access device 740, access device 740 enabling computing device 700 to communicate via one or more networks 760. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 740 may include one or more of any type of network interface, e.g., a Network Interface Card (NIC), wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In one embodiment of the present description, the above-described components of computing device 700, as well as other components not shown in FIG. 7, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 7 is for purposes of example only and is not limiting as to the scope of the present description. Those skilled in the art may add or replace other components as desired.
Computing device 700 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 700 may also be a mobile or stationary server.
Wherein processor 720 is configured to implement the steps of the data processing method when executing the computer-executable instructions.
The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the data processing method belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the data processing method.
An embodiment of the present specification further provides a computer readable storage medium storing computer instructions, which when executed by a processor, are used for implementing the steps of the data processing method as described above.
The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the data processing method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the data processing method.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts, but those skilled in the art should understand that the present embodiment is not limited by the described acts, because some steps may be performed in other sequences or simultaneously according to the present embodiment. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for an embodiment of the specification.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the embodiments. The specification is limited only by the claims and their full scope and equivalents.

Claims (11)

1. A method of data processing, comprising:
receiving a data storage request aiming at data to be stored of a target object, wherein the data storage request carries attribute information of the data to be stored and an object tag of the target object;
determining a creation timestamp and a latest timestamp of the data to be stored based on attribute information of the data to be stored, determining a data partitioning strategy based on the creation timestamp and the latest timestamp, and creating a data partition according to the data partitioning strategy;
determining data to be stored of the target object in each data partition, and determining index data corresponding to the data to be stored;
and constructing a first index table based on the data partitions corresponding to the index data, the target objects corresponding to the data partitions and the object tags of the target objects, and performing data storage on the data to be stored of each data partition based on the first index table.
2. The data processing method according to claim 1, wherein the data storage of the data to be stored of each data partition based on the first index table comprises:
constructing a second index table based on a target object corresponding to the data partition in the first index table, an object tag of the target object and attribute information of data to be stored of the target object;
acquiring a storage offset bit of data to be stored of each data partition based on the first index table and the second index table;
and storing the data to be stored according to the storage offset bit.
3. The data processing method according to claim 1 or 2, further comprising:
receiving a data query request, wherein the data query request carries attribute information of data to be queried;
determining a data partition of the data to be queried in the first index table based on the attribute information of the data to be queried, and determining a target object corresponding to the data partition and an object tag of the target object in the first index table based on the data partition;
and performing data query based on the target object and the object label of the target object.
4. The data processing method of claim 3, wherein the querying data based on the target object and the object tag of the target object comprises:
judging whether attribute information matched with the attribute information of the data to be inquired exists in the second index table or not based on the target object and the object tag of the target object;
if yes, determining the storage offset of the data to be queried based on the target object, and querying the data to be queried based on the storage offset.
5. The data processing method according to claim 1, further comprising, after storing the data to be stored according to the storage offset bit:
merging the first index table according to each data partition to obtain a merged inverted index file, and storing the merged inverted index file; and
and merging the second index table according to each data partition to obtain a merged front-row index file, and storing the merged front-row index file.
6. The data storage method of claim 1 or 5, the determining a creation timestamp and a latest timestamp of the data to be stored based on attribute information of the data to be stored, comprising:
and acquiring a creation timestamp and a latest timestamp of the data to be stored based on the attribute information of the data to be stored, and writing the creation timestamp and the latest timestamp into a log file for storage.
7. The data storage method of claim 6, wherein the data to be stored of each data partition based on the first index table is subjected to data storage, and the data storage method comprises the following steps:
creating a new data partition based on the latest timestamp of the data to be stored and the data partition policy;
and updating a first index table corresponding to the new data partition according to the new data partition, and storing the data to be stored of the new data partition based on the first index table.
8. The data storage method according to claim 2, further comprising, after performing data storage on the data to be stored of each data partition based on the first index table:
and deleting the data to be stored of the data partition based on a preset requirement.
9. A data storage device comprising:
the data storage system comprises a receiving module, a storage module and a processing module, wherein the receiving module is configured to receive a data storage request of data to be stored aiming at a target object, and the data storage request carries attribute information of the data to be stored and an object tag of the target object;
the creating module is configured to determine a creating time stamp and a latest time stamp of the data to be stored based on attribute information of the data to be stored, determine a data partitioning strategy based on the creating time stamp and the latest time stamp, and create a data partition according to the data partitioning strategy;
the determining module is configured to determine to-be-stored data of the target object in each data partition and determine index data corresponding to the to-be-stored data;
the storage module is configured to construct a first index table based on the data partitions corresponding to the index data, the target objects corresponding to the data partitions and the object tags of the target objects, and perform data storage on the data to be stored of each data partition based on the first index table.
10. A computing device, comprising:
a memory and a processor;
the memory is configured to store computer-executable instructions, and the processor is configured to implement the steps of the data storage method of any one of claims 1 to 8 when executing the computer-executable instructions.
11. A computer readable storage medium storing computer instructions which, when executed by a processor, carry out the steps of the data storage method of any one of claims 1 to 8.
CN202110182739.9A 2021-02-10 2021-02-10 Data processing method and device Pending CN113297135A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110182739.9A CN113297135A (en) 2021-02-10 2021-02-10 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110182739.9A CN113297135A (en) 2021-02-10 2021-02-10 Data processing method and device

Publications (1)

Publication Number Publication Date
CN113297135A true CN113297135A (en) 2021-08-24

Family

ID=77318978

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110182739.9A Pending CN113297135A (en) 2021-02-10 2021-02-10 Data processing method and device

Country Status (1)

Country Link
CN (1) CN113297135A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113434557A (en) * 2021-08-26 2021-09-24 苏州浪潮智能科技有限公司 Method, device, equipment and storage medium for querying range of label data
CN114661722A (en) * 2022-03-23 2022-06-24 天津同阳科技发展有限公司 Data storage method, data indexing method and device
CN115129664A (en) * 2022-09-01 2022-09-30 湖南兴天电子科技股份有限公司 Data recording device, data file management method and apparatus
CN116304390A (en) * 2023-04-13 2023-06-23 北京基调网络股份有限公司 Time sequence data processing method and device, storage medium and electronic equipment
CN117555968A (en) * 2024-01-12 2024-02-13 浙江智臾科技有限公司 Data processing method, device, equipment and storage medium
CN117596176A (en) * 2024-01-17 2024-02-23 苏州元脑智能科技有限公司 Network state measuring method, device, equipment and storage medium
CN117908803A (en) * 2024-03-19 2024-04-19 深圳市双银科技有限公司 Data storage method and system based on big data

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113434557A (en) * 2021-08-26 2021-09-24 苏州浪潮智能科技有限公司 Method, device, equipment and storage medium for querying range of label data
CN113434557B (en) * 2021-08-26 2021-12-17 苏州浪潮智能科技有限公司 Method, device, equipment and storage medium for querying range of label data
CN114661722A (en) * 2022-03-23 2022-06-24 天津同阳科技发展有限公司 Data storage method, data indexing method and device
CN115129664A (en) * 2022-09-01 2022-09-30 湖南兴天电子科技股份有限公司 Data recording device, data file management method and apparatus
CN116304390A (en) * 2023-04-13 2023-06-23 北京基调网络股份有限公司 Time sequence data processing method and device, storage medium and electronic equipment
CN116304390B (en) * 2023-04-13 2024-02-13 北京基调网络股份有限公司 Time sequence data processing method and device, storage medium and electronic equipment
CN117555968A (en) * 2024-01-12 2024-02-13 浙江智臾科技有限公司 Data processing method, device, equipment and storage medium
CN117555968B (en) * 2024-01-12 2024-04-19 浙江智臾科技有限公司 Data processing method, device, equipment and storage medium
CN117596176A (en) * 2024-01-17 2024-02-23 苏州元脑智能科技有限公司 Network state measuring method, device, equipment and storage medium
CN117596176B (en) * 2024-01-17 2024-04-19 苏州元脑智能科技有限公司 Network state measuring method, device, equipment and storage medium
CN117908803A (en) * 2024-03-19 2024-04-19 深圳市双银科技有限公司 Data storage method and system based on big data

Similar Documents

Publication Publication Date Title
CN113297135A (en) Data processing method and device
CN109165215B (en) Method and device for constructing space-time index in cloud environment and electronic equipment
US8560531B2 (en) Search tool that utilizes scientific metadata matched against user-entered parameters
CN106682077B (en) Mass time sequence data storage implementation method based on Hadoop technology
CN102779138B (en) The hard disk access method of real time data
CN111339103B (en) Data exchange method and system based on full-quantity fragmentation and incremental log analysis
CN107451233B (en) Method for storing spatiotemporal trajectory data file with priority of time attribute in auxiliary storage device
CN109117440B (en) Metadata information acquisition method, system and computer readable storage medium
CN113297269A (en) Data query method and device
CN111125171A (en) Monitoring data access method, device, equipment and readable storage medium
CN110888880A (en) Proximity analysis method, device, equipment and medium based on spatial index
CN113656397A (en) Index construction and query method and device for time series data
CN111694860A (en) Safety detection time sequence data real-time abnormity discovery method and electronic device
Chao et al. Efficient trajectory contact query processing
CN110110234B (en) Big data real-time searching system and method
CN113111098B (en) Method and device for detecting query of time sequence data and time sequence database system
CN113761059A (en) Data processing method and device
US20160078071A1 (en) Large scale offline retrieval of machine operational information
CN111723092A (en) Data processing method and device
CN108647243B (en) Industrial big data storage method based on time series
CN110851450A (en) Accompanying vehicle instant discovery method based on incremental calculation
CN116186116A (en) Asset problem analysis method based on equal protection assessment
CN115858471A (en) Service data change recording method, device, computer equipment and medium
CN105468748B (en) Distributed storage position data method and system
CN114969083A (en) Real-time data analysis method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40059177

Country of ref document: HK