CN110727406B - Data storage scheduling method and device - Google Patents

Data storage scheduling method and device Download PDF

Info

Publication number
CN110727406B
CN110727406B CN201910965867.3A CN201910965867A CN110727406B CN 110727406 B CN110727406 B CN 110727406B CN 201910965867 A CN201910965867 A CN 201910965867A CN 110727406 B CN110727406 B CN 110727406B
Authority
CN
China
Prior art keywords
data
storage
preset
stored
scheduling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910965867.3A
Other languages
Chinese (zh)
Other versions
CN110727406A (en
Inventor
董维
张磊
黄如
向洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen ZNV Technology Co Ltd
Nanjing ZNV Software Co Ltd
Original Assignee
Shenzhen ZNV Technology Co Ltd
Nanjing ZNV Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen ZNV Technology Co Ltd, Nanjing ZNV Software Co Ltd filed Critical Shenzhen ZNV Technology Co Ltd
Priority to CN201910965867.3A priority Critical patent/CN110727406B/en
Publication of CN110727406A publication Critical patent/CN110727406A/en
Application granted granted Critical
Publication of CN110727406B publication Critical patent/CN110727406B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0685Hybrid storage combining heterogeneous device types, e.g. hierarchical storage, hybrid arrays

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A data storage scheduling method and device comprises the following steps: firstly, monitoring the attribute of storage data in a storage medium in real time, wherein the attribute of the storage data comprises storage time and data type, so as to redetermine the data type of the storage data according to preset data classification logic and the storage time and the data type of the storage data, then determining a scheduling strategy of the storage data according to preset storage scheduling logic and the redetermined data type, wherein the scheduling strategy comprises the storage medium corresponding to the storage data, and finally performing storage scheduling on the storage data according to the scheduling strategy. By monitoring attribute change of data in real time, scheduling and storing the data, mobility of the data is improved, high-instantaneity data is guaranteed to be stored in a high-performance high-speed storage medium, and low-instantaneity data can be stored in a storage medium with low access speed, so that the contradiction problem among the large memory capacity, high speed and low cost of system storage is solved.

Description

Data storage scheduling method and device
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a data storage scheduling method and apparatus.
Background
The data store is a temporary file created during processing of the data stream or information that needs to be looked up during processing. Data is recorded in a certain format on a storage device of a computer, and the storage device of the computer can be divided into an internal memory and an external memory from the aspect of architecture, wherein the internal memory (i.e. a memory) is directly connected with a CPU of the computer and is positioned at the top layer of data storage. Its access speed requirement can be matched to the CPU, typically consisting of semiconductor memory chips, which are typically not too large in capacity due to their high cost. For storing a large amount of data, an external memory is usually used, and the external memory can be divided into several layers, and the first layer is connected with the internal memory, which includes an online memory (or called online memory), such as a hard disk drive, a disk array, and the like; the next layer is a back-up memory (or near line memory) which is composed of CD machine, CD library, tape library and other devices with slower access speed than hard disk; the bottom layer is an offline memory (or offline memory), and a warehouse is composed of tape drives, tape libraries and the like, and the access speed is relatively slow, which is only an order of magnitude, and the capacity is almost infinite because the storage medium can be stored offline and can be replaced. For common personal computer users, it is enough to use storage media such as hard disk, software and optical disk to store data, but for commercial users and some network systems, tape drives, tape libraries and optical disk libraries are indispensable data storage and backup devices, and now there are rapidly developing storage networks, which can provide more convenient data storage modes.
In a monitoring system, there are numerous monitored equipment objects, one monitored equipment object has many monitoring indexes, all monitoring indexes need to meet the requirement of timing acquisition analysis, data storage is involved in processing all monitoring data, the data is stored on a common hard disk, the read-write speed is limited by the hard disk when the data is accessed in high concurrency, the data read-write efficiency can be improved when a solid-state storage disk or a memory is used for replacing a traditional mechanical hard disk, but under the condition of mass data scale, a high-performance storage medium can bring huge hardware cost, at present, no scheme related to data storage transfer exists, the cost of data read-storage is influenced by the high-performance storage medium for data with low timeliness, and the system data storage scheduling has the contradiction among large memory capacity, high speed and low cost. Taking a data center as an example, the larger the scale of the data center is, the more the types and the number of objects to be monitored in real time are, and how to design an efficient storage scheduling mechanism under massive data is an important technical direction and a difficult problem in the operation and maintenance field.
Disclosure of Invention
The invention mainly solves the technical problems of large memory capacity, high speed and low cost of the system storage.
According to a first aspect, in one embodiment, there is provided a data storage scheduling method, including:
Monitoring the attribute of the stored data in the storage medium in real time; wherein the attribute of the stored data comprises storage time and data type;
the data type of the stored data is redetermined according to preset data classification logic and the attribute of the stored data;
determining a scheduling strategy of the stored data according to a preset storage scheduling logic and the redetermined data type; the scheduling strategy comprises a storage medium corresponding to storage data;
and carrying out storage scheduling on the storage data according to the scheduling policy.
In one possible implementation manner, the redefining the storage data type according to the preset data classification logic and the attribute of the storage data includes:
acquiring a data type corresponding to the stored data according to the storage equipment;
Acquiring preset data classification logic corresponding to the data type;
and re-determining the type of the stored data according to the preset data classification logic and the storage time of the stored data.
In one possible implementation manner, the acquiring the data type of the stored data includes:
Calculating a data heat value according to the behavior time of the stored data;
Detecting that the data heat value is within a preset first time range, classifying the data into heat data, and storing the heat data into corresponding storage media;
Detecting that the data heat value is within a preset second time range, classifying the data into warm data, and storing the warm data into a corresponding storage medium;
Detecting that the data heat value is within a preset third time range, classifying the data into cold data, and storing the cold data into a corresponding storage medium; the time dimensions of the preset first time range, the preset second time range and the preset third time range are gradually increased.
In one possible implementation manner, the acquiring the preset data classification logic corresponding to the data type includes:
When the data type is hot data, the corresponding preset data classification logic comprises a preset fourth time range;
when the data type is warm data, the corresponding preset data classification logic comprises a preset fifth time range;
When the data type is cold data, the corresponding preset data classification logic comprises a preset sixth time range; wherein the first time range, the second time range, and the third time range are different in time dimension.
In one possible implementation manner, the redefining the data type of the stored data according to the preset data classification logic and the attribute of the stored data includes:
When the storage time of the thermal data is in the fourth time range, determining that the thermal data is warm data or data to be deleted;
When the storage time of the warm data is in the fifth time range, determining that the warm data is cold data or data to be deleted;
and when the storage time of the cold data is in the sixth time range, determining that the cold data is to be deleted or to be compressed and archived.
In one possible implementation manner, the attribute of the stored data includes a data service type;
dividing the preset fourth time range, the preset fifth time range and the preset sixth time range according to the data service type.
In one possible implementation manner, the outputting the scheduling policy of the stored data according to the preset storage scheduling logic and the type of the stored data includes:
When the hot data is determined to be warm data again, outputting a scheduling strategy of the hot data to a warm data storage medium according to a preset storage scheduling logic;
when the hot data is determined to be the data to be deleted again, outputting a scheduling policy of the hot data to be deleted according to a preset storage scheduling logic;
When the warm data is determined to be cold data again, outputting a scheduling strategy of the warm data to a cold data storage medium according to a preset storage scheduling logic;
When the warm data is determined to be the data to be deleted again, outputting a scheduling strategy of the warm data to be deleted according to a preset storage scheduling logic;
when the cold data is determined to be the data to be deleted again, outputting a scheduling strategy of the cold data to be deleted according to a preset storage scheduling logic;
And when the cold data is determined to be the to-be-compressed archived data again, outputting a scheduling policy of the cold data to be the to-be-compressed archived according to a preset storage scheduling logic.
In one possible implementation manner, the performing storage scheduling on the storage data according to the scheduling policy includes:
Responding to a data storage request, and receiving a transfer storage strategy in the scheduling strategy;
According to a transfer storage strategy, the data to be stored are transferred and stored, so that hot data are stored in a top storage medium, warm data are stored in a middle storage medium and/or cold data are stored in a bottom storage medium; wherein the top storage medium, the middle storage medium and the bottom storage medium are different from each other, and the data access speed is sequentially decreased.
In one possible implementation manner, the performing storage scheduling on the storage data according to the scheduling policy further includes:
and performing data verification before and/or after transferring and storing the data so as to ensure the integrity of the data.
According to a second aspect, in one embodiment there is provided a data storage scheduling apparatus comprising:
The monitoring module is used for monitoring the attribute of the stored data in the storage medium in real time; wherein the attribute of the stored data comprises storage time and data type;
The type determining module is used for redefining the data type of the stored data according to preset data classification logic and the attribute of the stored data;
The result output module is used for determining a scheduling strategy of the stored data according to a preset storage scheduling logic and the redetermined data type; the scheduling strategy comprises a storage medium corresponding to storage data;
And the processing module is used for carrying out storage scheduling on the storage data according to the scheduling policy.
According to the data storage scheduling method and device, firstly, the attribute of the storage data in the storage medium is monitored in real time, wherein the attribute of the storage data comprises storage time and data type, the data type of the storage data is redetermined according to preset data classification logic and the storage time and the data type of the storage data, then the scheduling strategy of the storage data is determined according to the preset storage scheduling logic and the redetermined data type, the scheduling strategy comprises the storage medium corresponding to the storage data, and finally the storage scheduling is carried out on the storage data according to the scheduling strategy. By monitoring attribute change of the data in real time, scheduling and storing the data according to storage time of the data and corresponding data types, mobility of the data is improved, high-instantaneity data is guaranteed to be stored in a high-performance high-speed storage medium, and low-instantaneity data can be stored in a storage medium with low access speed, so that the contradiction problem among the large memory capacity, high speed and low cost of system storage is solved.
Drawings
FIG. 1 is a schematic flow chart of a data storage scheduling method according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a data type determining method according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of a method for determining a storage medium according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating another data storage scheduling method according to an embodiment of the present invention;
FIG. 5 is a flowchart of a method for obtaining preset data classification logic according to an embodiment of the present invention;
FIG. 6 is a flowchart of a method for determining a data type according to an embodiment of the present invention;
FIG. 7 is a flowchart of a method for determining a scheduling policy for storing data according to an embodiment of the present invention;
FIG. 8 is a schematic flow chart of a storage scheduling method according to an embodiment of the present invention;
Fig. 9 is a schematic structural diagram of a data storage scheduling device according to an embodiment of the present invention.
Detailed Description
The application will be described in further detail below with reference to the drawings by means of specific embodiments. Wherein like elements in different embodiments are numbered alike in association. In the following embodiments, numerous specific details are set forth in order to provide a better understanding of the present application. However, one skilled in the art will readily recognize that some of the features may be omitted, or replaced by other elements, materials, or methods in different situations. In some instances, related operations of the present application have not been shown or described in the specification in order to avoid obscuring the core portions of the present application, and may be unnecessary to persons skilled in the art from a detailed description of the related operations, which may be presented in the description and general knowledge of one skilled in the art.
Furthermore, the described features, operations, or characteristics of the description may be combined in any suitable manner in various embodiments. Also, various steps or acts in the method descriptions may be interchanged or modified in a manner apparent to those of ordinary skill in the art. Thus, the various orders in the description and drawings are for clarity of description of only certain embodiments, and are not meant to be required orders unless otherwise indicated.
The numbering of the components itself, e.g. "first", "second", etc., is used herein merely to distinguish between the described objects and does not have any sequential or technical meaning. The term "coupled" as used herein includes both direct and indirect coupling (coupling), unless otherwise indicated.
In the embodiment of the invention, the inventor provides a data storage scheduling scheme based on the contradiction problem among the existing system storage that the memory capacity is large, the speed is high and the cost is low, firstly, the data is monitored and managed in real time through the data attribute, certain scheduling is carried out on the data according to the monitoring result, the fluidity of the data is improved, the occupation of the high-performance storage medium by the outdated data is avoided through the improvement of the fluidity of the data, the data access speed and the storage cost are ensured, and then the data is stored into different storage media according to different data types, so that the balance among the large memory capacity, the high speed and the low cost of the system storage is achieved.
Example 1
Referring to fig. 1, a data storage scheduling method provided in an embodiment of the invention includes steps S10 to S40, and is specifically described below.
Step S10: monitoring the attribute of the stored data in the storage medium in real time; wherein, the attribute of the stored data comprises storage time and data type.
In the embodiment of the present invention, in step S10, the storage data in the storage medium is monitored to obtain the state of the storage data, including the data type of the storage data of the current storage medium, the storage time of the storage data stored in the storage medium, and the service type of the storage data, and then the following steps are performed according to the obtained state of the storage data.
It should be noted that, the storage medium includes a top storage medium, a middle storage medium, a bottom storage medium, and a corresponding storage software middleware, where the top storage medium includes a top storage device and a top storage software middleware, and the top storage device is a high-speed data product and may include a memory and the like. The top-level storage software middleware includes Redis, E1ASTIC SEARCH, influxdb, and the like. The data stored in the top storage medium is large in data quantity and high in instantaneity. The middle layer storage medium includes a middle layer storage device and a middle layer storage software middleware, wherein the middle layer storage device can be a memory+SSD, or a high-speed hard disk or other high-speed storage device. The middle tier storage software middleware includes time series databases such as influxdb, ELASTIC SEARCH, and the like. The bottom storage medium comprises bottom storage equipment and bottom storage software middleware, wherein the bottom storage equipment comprises a common hard disk, or a traditional mechanical hard disk+HDFS and the like. The bottom layer storage software middleware comprises middleware using Hadoop big data storage technology. The speed of accessing data of the top storage medium, the middle storage medium and the bottom storage medium is gradually decreased, and the real-time property of data bits stored in the bottom storage medium is not high, such as historical monitoring data, which is mainly used for data analysis and statistical query of historical data. However, the bottom storage device in the bottom storage medium is based on the cooperation management of the bottom storage software middleware, so that the bottom storage medium also has the performances of multiple backups, high reliability, distributed storage, large storage throughput and the like.
In an embodiment of the application, hot data is stored in the top storage medium, warm data is stored in the middle storage medium and/or in the top storage medium, and cold data is stored in the bottom storage medium. The stored data stored in the top storage medium is defined as hot data when the top storage medium is accessed, and even the stored data stored in the top storage medium is not always hot data based on the inventive concept of the present application, the data type is redefined according to the preset data classification logic, and the data type is possibly defined as warm data, cold data or data to be deleted according to the redefined data type. The data types of the corresponding warm data and cold data may be redefined, which is not particularly limited in the present application.
In the embodiment of the invention, the hot data in the top-layer storage medium at least comprises two sources, one is the data which is taken out from the relational database and put into the memory database, and the other is the data which is reported in real time. For the data taken out of the relational database, the latest data can be regularly taken out of the relational database, and the data in the memory database is refreshed; and for the data reported in real time, according to the service scene, the latest reported monitoring data is always stored, and the data within more than one day is directly deleted. The warm data in the middle-layer storage medium at least comprises data reported in real time, and the warm data can be stored in the top-layer storage medium.
In the embodiment of the invention, the data stored in the bottom storage medium is mainly used for data analysis, statistics and other works, and the data volume is continuously increased along with the increase of time, so that the storage medium is increased, and the corresponding maintenance support consumption of manpower and material resources is accompanied, so that the following operations are required to be comprehensively considered from the aspects of service requirements and storage cost, and the following operations are correspondingly and periodically executed: and (3) a pair of ultra-long time data is compressed and stored at regular time, for example, the data over 5 years is taken out for compression and archiving at regular time, and then the original data is deleted, so that the storage space is saved on the premise of not losing the data. 2. And analyzing the service data, only preserving the fields which are used currently and possibly, and deleting the fields which have little meaning or are rarely used, so that the data storage space is reduced. In the above embodiments, the warm data and the cold data may be shared with the same storage medium, considering the limited data volume of the medium-small-scale item, but in order to improve the system response efficiency, it is necessary to perform the library and table division according to the data time and the size of the database or table. The present invention is not particularly limited thereto.
Step S20: and re-determining the data type of the stored data according to preset data classification logic and the attribute of the stored data.
In the embodiment of the present invention, referring to fig. 2, step S20 includes step S21, step S22 and step S23, which are specifically described below.
Step S21: and acquiring the data type corresponding to the stored data according to the storage equipment.
In the embodiment of the present invention, referring to fig. 3, it is determined that each storage data is stored in a corresponding storage medium according to steps S201 to S204, which will be described in detail below.
Step S201: and calculating a data heat value according to the behavior time of the stored data.
The behavior time of the stored data includes the time of the stored data that has been operated, and for example, when the stored data has operations such as adding content, deleting content, modifying content, or being accessed, the time from the time of the stored data being operated to the current time is the corresponding behavior time. Illustratively, for a data center, it is the time from which reported monitoring or alert data is collected to the current moment.
Step S202: and detecting that the data heat value is within a preset first time range, classifying the data into heat data, and storing the heat data into corresponding storage media.
Step S203: and detecting that the data heat value is within a preset second time range, classifying the data as warm data, and storing the warm data into a corresponding storage medium.
Step S204: and classifying the data into cold data and storing the cold data into corresponding storage media when the detected data heat value is in a preset third time range, wherein the time dimensions of the preset first time range, the preset second time range and the preset third time range are gradually increased.
Referring to fig. 4, for the real-time data collected at the current moment, the real-time data can be classified into a warm data cluster or a hot data cluster according to the behavior time of the real-time data, when the corresponding heat value is calculated to be within a first time range according to the behavior time of the real-time data, the real-time data is inserted into the hot data cluster, and the hot data cluster corresponds to a top storage medium, namely, the data is classified as hot data and is stored in the corresponding storage medium. And when the corresponding heat value is calculated to be in the second time range according to the behavior time of the real-time data, inserting the real-time data into a warm data cluster, wherein the warm data cluster corresponds to a middle-layer storage medium, namely the data is classified as warm data, and the warm data is stored in the corresponding storage medium. When the corresponding heat value is calculated to be in the third time range according to the action time of the real-time data, the real-time data is inserted into the cold data cluster, the cold data cluster corresponds to the bottom storage medium, namely the data is classified as cold data and is stored in the corresponding storage medium, and the invention is not limited in particular.
In the embodiment of the present invention, the attribute of the stored data further includes a data service type, and the preset first time range, the preset second time range, and the preset third time range may be divided according to the data service type. Taking the performance data of the device reported in real time as an example, the data of the behavior time within 24 hours is determined to be hot data, the behavior time data of more than 1 day and less than or equal to 14 days is determined to be warm data, and the behavior time data of more than 14 days is determined to be cold data. The time dividing limit can be flexibly configured according to different actual services. When the currently collected real-time data is AI preset data, the preset first time range, the preset second time range and the preset third time range set by the current collected real-time data may be different from the preset first time range, the preset second time range and the preset third time range of the monitoring data.
It should be noted that, the execution order of the steps S202 to S204 is not limited, and the data is classified as hot data to be stored in the corresponding storage medium when the current data is stored in the first time range, classified as warm data to be stored in the corresponding storage medium when the current data is stored in the second time range, and classified as cold data to be stored in the corresponding storage medium when the current data is stored in the third time range.
Step S22: and acquiring preset data classification logic corresponding to the data type.
In the embodiment of the present invention, referring to fig. 5, step S22 includes steps S221 to S223, which are specifically described below.
Step S221: when the data type is hot data, the corresponding preset data classification logic comprises a preset fourth time range.
Step S222: when the data type is warm data, the corresponding preset data classification logic comprises a preset fifth time range.
Step S223: when the data type is cold data, the corresponding preset data classification logic comprises a preset sixth time range; wherein the first time range, the second time range, and the third time range are different in time dimension.
In one possible implementation manner, the attribute of the stored data further includes a data service type, and the preset fourth time range, the preset fifth time range and the preset sixth time range are divided according to the data service type. I.e. the predetermined fourth time range, the predetermined fifth time range and/or the sixth time range of the division thereof may be different for different data types. For example, the sixth time range corresponding to the monitored data may be 14 days when the monitored data is warm data, and the sixth time range corresponding to the AI predicted data is 7 days when the monitored data is warm data, which is not particularly limited by the present invention.
In the embodiment of the invention, the attribute of the stored data comprises a data service type, the stored data has dimensions such as service attribute, time attribute and the like, different service data can have corresponding data of three time dimensions of cold, warm and hot according to service reality, and for monitoring the service, reported telemetry data (in dynamic ring monitoring, the measurement point data of a monitoring object reported in real time) is expanded according to time, and three data of hot data, warm data and cold data are available. The real-time reported data is needed to be stored in a storage medium corresponding to the warm data, part of the warm data can be placed in the storage medium corresponding to the hot data for convenient processing, after being classified according to the preset data classification logic, the warm data can be selected to be dumped into the cold data for long-term storage, and then after being classified according to the preset data classification logic, the storage data in the storage medium corresponding to the cold data can be selected to be compressed and archived. For the configuration management module of the monitoring system, only part of data may be frequently used, and the part of data may be placed in a storage medium corresponding to the hot data. And for the AI prediction module, the result of the model calculation can be simultaneously put into the storage medium corresponding to the hot data and the warm data, after a period of time, the hot data is deleted, and the warm data is dumped into the storage medium corresponding to the cold data.
Step S23: and re-determining the type of the stored data according to the preset data classification logic and the storage time of the stored data.
In the embodiment of the present invention, referring to fig. 6, the step S23 includes steps S231 to S233, which will be specifically described below.
Step S231: and when the storage time of the thermal data is in the fourth time range, determining that the thermal data is warm data or data to be deleted.
Step S232: and when the storage time of the warm data is in the fifth time range, determining that the warm data is cold data or data to be deleted.
Step S233: and when the storage time of the cold data is in the sixth time range, determining that the cold data is to be deleted or to be compressed and archived.
In the embodiment of the present invention, the scheduling is performed according to the time when the data is stored in each storage medium, and may also be performed according to the storage capacity of the storage medium, referring to fig. 4, for the stored data in the warm data cluster, the stored data with the storage time in the fifth time range may be determined to be cold data at regular time according to setting a timing task, or the capacity in the warm data storage medium may be restored at regular time by the timing task, when the storage capacity in the warm data storage medium is greater than a preset threshold, the stored data in the warm data storage medium may be determined to be cold data, and when the storage capacity in the warm data storage medium is greater than the preset threshold, the warm data with the longest storage time may be determined to be cold data in combination with the storage time. The type of the warm data may be determined when the storage time of the warm data is within the fifth time range according to the data traffic type, for example, for the monitor data whose data type is the warm data, it is determined that it is cold data when the storage time thereof is within the fifth time range, and for the AI-predicted data whose data type is the warm data, it is determined that it is data to be deleted when the storage time thereof is within the fifth time range.
Step S30: determining a scheduling strategy of the stored data according to a preset storage scheduling logic and the redetermined data type; the scheduling strategy comprises a storage medium corresponding to the storage data.
In the embodiment of the present invention, referring to fig. 7, the step S30 includes steps S31 to S36, which will be specifically described below.
Step S31: and when the hot data is determined to be warm data again, outputting the scheduling strategy of the hot data to a warm data storage medium according to a preset storage scheduling logic.
Step S32: and when the hot data is determined to be the data to be deleted again, outputting a scheduling policy of the hot data to be deleted according to a preset storage scheduling logic.
Step S33: and when the warm data is determined to be cold data again, outputting the scheduling strategy of the warm data to a cold data storage medium according to a preset storage scheduling logic.
Step S34: and when the warm data is determined to be the data to be deleted again, outputting the scheduling policy of the warm data to be deleted according to a preset storage scheduling logic.
Step S35: and when the cold data is determined to be the data to be deleted again, outputting the scheduling strategy of the cold data to be deleted according to a preset storage scheduling logic.
Step S36: and when the cold data is determined to be the to-be-compressed archived data again, outputting a scheduling policy of the cold data to be the to-be-compressed archived according to a preset storage scheduling logic.
Step S40: and carrying out storage scheduling on the storage data according to the scheduling policy.
Referring to fig. 8, in one possible implementation manner, step S40 includes steps S41 to S42, which are specifically described below.
Step S41: and responding to the data storage request, and receiving a transfer storage strategy in the scheduling strategies.
Step S42: according to a transfer storage strategy, the data to be stored are transferred and stored, so that hot data are stored in a top storage medium, warm data are stored in a middle storage medium and/or cold data are stored in a bottom storage medium; wherein the top storage medium, the middle storage medium and the bottom storage medium are different from each other, and the data access speed is sequentially decreased.
In one possible implementation manner, the method further includes: and performing data verification before and/or after transferring and storing the data so as to ensure the integrity of the data. In order to ensure the reliability of data in migration operation, methods of automating or manually testing the service and the data itself, such as the reliability of the integrity of the data, and the like before and after the operation, and assisting in checking the integrity of the file, such as checking MD5 of the file, and the like, are required.
Referring to FIG. 4, for a hot data cluster, the data operations therein include inserting the latest data determined to be hot data, and also deleting "outdated" old data. For the warm data cluster, the data in the warm data cluster can finish overtime data transfer and/or transfer the data exceeding the capacity threshold value through one timing task, the data is transferred to the cold data cluster after the data is checked and stored for more than 14 days, when the data storage capacity of the middle-layer storage medium exceeds 80%, the data in the middle-layer storage medium is checked and transferred to the cold data cluster, after the data in the middle-layer storage medium is transferred to the cold data, the corresponding warm data can be deleted, and the corresponding hot data is deleted after the hot data in the top-layer storage medium is transferred. Or the warm data stored for more than 14 days is deleted directly. And when the storage time of the cold data in the cold data cluster exceeds 3 years, the cold data in the cold data cluster is compressed and archived.
The embodiment has the following characteristics in real time:
Firstly, monitoring the attribute of storage data in a storage medium in real time, wherein the attribute of the storage data comprises storage time and data type, so as to redetermine the data type of the storage data according to preset data classification logic and the storage time and the data type of the storage data, then determining a scheduling strategy of the storage data according to preset storage scheduling logic and the redetermined data type, wherein the scheduling strategy comprises the storage medium corresponding to the storage data, and finally performing storage scheduling on the storage data according to the scheduling strategy. By monitoring attribute change of the data in real time, scheduling and storing the data according to storage time of the data and corresponding data types, mobility of the data is improved, high-instantaneity data is guaranteed to be stored in a high-performance high-speed storage medium, and low-instantaneity data can be stored in a storage medium with low access speed, so that the contradiction problem among the large memory capacity, high speed and low cost of system storage is solved.
Example two
Referring to fig. 9, a data storage scheduling apparatus includes:
A monitoring module 21, configured to monitor an attribute of the stored data in the storage medium in real time; wherein, the attribute of the stored data comprises storage time and data type.
The type determining module 22 is configured to redetermine a data type of the stored data according to a preset data classification logic and an attribute of the stored data.
A result output module 23, configured to determine a scheduling policy of the stored data according to a preset storage scheduling logic and a redetermined data type; the scheduling strategy comprises a storage medium corresponding to the storage data.
And the processing module 24 is used for carrying out storage scheduling on the storage data according to the scheduling policy.
The embodiment of the invention has the following characteristics:
Firstly, monitoring the attribute of storage data in a storage medium in real time, wherein the attribute of the storage data comprises storage time and data type, so as to redetermine the data type of the storage data according to preset data classification logic and the storage time and the data type of the storage data, then determining a scheduling strategy of the storage data according to preset storage scheduling logic and the redetermined data type, wherein the scheduling strategy comprises the storage medium corresponding to the storage data, and finally performing storage scheduling on the storage data according to the scheduling strategy. By monitoring attribute change of the data in real time, scheduling and storing the data according to storage time of the data and corresponding data types, mobility of the data is improved, high-instantaneity data is guaranteed to be stored in a high-performance high-speed storage medium, and low-instantaneity data can be stored in a storage medium with low access speed, so that the contradiction problem among the large memory capacity, high speed and low cost of system storage is solved.
Those skilled in the art will appreciate that all or part of the functions of the various methods in the above embodiments may be implemented by hardware, or may be implemented by a computer program. When all or part of the functions in the above embodiments are implemented by means of a computer program, the program may be stored in a computer readable storage device, and the storage device may include: read-only memory, random access memory, magnetic disk, optical disk, hard disk, etc., and the program is executed by a computer to realize the above-mentioned functions. For example, the program is stored in the memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above can be realized. In addition, when all or part of the functions in the above embodiments are implemented by means of a computer program, the program may be stored in a storage device such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and the functions in all or part of the above embodiments may be implemented by downloading or copying the program into a memory of a local device or updating a version of a system of the local device, and executing the program in the memory by a processor.
The foregoing description of the invention has been presented for purposes of illustration and description, and is not intended to be limiting. Several simple deductions, modifications or substitutions may also be made by a person skilled in the art to which the invention pertains, based on the idea of the invention.
While the embodiments of the present invention have been described above with reference to the drawings, the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many modifications may be made thereto by those of ordinary skill in the art without departing from the spirit of the present invention and the scope of the appended claims, which are to be accorded the full scope of the present invention as defined by the following description and drawings, or by any equivalent structures or equivalent flow changes, or by direct or indirect application to other relevant technical fields.

Claims (9)

1. A data storage scheduling method, comprising:
Monitoring the attribute of the stored data in the storage medium in real time; the attribute of the storage data comprises storage time, data type and data service type, wherein the data service type at least comprises monitoring data and AI prediction data; the data type is determined according to a data heat value of the stored data, the data heat value is determined according to a behavior time of the stored data, the data type comprises hot data, warm data and cold data, the data heat value of the hot data is in a preset first time range, the data heat value of the warm data is in a preset second time range, the data heat value of the cold data is in a preset third time range, and the stored data of different data types are stored in different storage devices; dividing preset data classification logic according to the data service type, wherein the preset data classification logic comprises a preset fourth time range, a preset fifth time range and a preset sixth time range, the preset fourth time range corresponds to the storage time of the hot data, the preset fifth time range corresponds to the storage time of the warm data, and the preset sixth time range corresponds to the storage time of the cold data; for the stored data of the same data type but different data service types, the corresponding preset data classification logic is different;
Acquiring corresponding preset data classification logic according to the data type and the data service type of the stored data;
The data type of the stored data is redetermined according to the preset data classification logic and the storage time of the stored data;
determining a scheduling strategy of the stored data according to a preset storage scheduling logic and the redetermined data type; the scheduling strategy comprises a storage medium corresponding to storage data;
and carrying out storage scheduling on the storage data according to the scheduling policy.
2. The method of claim 1, wherein the time dimensions of the preset first time range, the preset second time range, and the preset third time range are progressively increased.
3. The method of claim 2, wherein the preset fourth time range, the preset fifth time range, and the preset sixth time range are different for the stored data of different data types.
4. The method of claim 3, wherein the redefining the data type of the stored data based on the preset data classification logic and the storage time of the stored data comprises:
when the storage time of the thermal data is within the preset fourth time range, determining that the thermal data is warm data or data to be deleted;
when the storage time of the warm data is within the preset fifth time range, determining that the warm data is cold data or data to be deleted;
And when the storage time of the cold data is within the preset sixth time range, determining that the cold data is to be deleted or to be compressed and archived.
5. The method of claim 3 or 4, further comprising: dividing the preset first time range, the preset second time range and the preset third time range according to the data service type.
6. The method of claim 4, wherein outputting the scheduling policy of the stored data according to a preset stored scheduling logic and the type of the stored data comprises:
When the hot data is determined to be warm data again, outputting a scheduling strategy of the hot data to a warm data storage medium according to a preset storage scheduling logic;
when the hot data is determined to be the data to be deleted again, outputting a scheduling policy of the hot data to be deleted according to a preset storage scheduling logic;
When the warm data is determined to be cold data again, outputting a scheduling strategy of the warm data to a cold data storage medium according to a preset storage scheduling logic;
When the warm data is determined to be the data to be deleted again, outputting a scheduling strategy of the warm data to be deleted according to a preset storage scheduling logic;
when the cold data is determined to be the data to be deleted again, outputting a scheduling strategy of the cold data to be deleted according to a preset storage scheduling logic;
And when the cold data is determined to be the to-be-compressed archived data again, outputting a scheduling policy of the cold data to be the to-be-compressed archived according to a preset storage scheduling logic.
7. The method of claim 6, wherein said scheduling storage of said stored data according to said scheduling policy comprises:
Responding to a data storage request, and receiving a transfer storage strategy in the scheduling strategy;
According to the transfer storage strategy, the data to be stored are transferred and stored, so that hot data are stored in a top-layer storage medium, warm data are stored in a middle-layer storage medium and/or cold data are stored in a bottom-layer storage medium; wherein the top storage medium, the middle storage medium and the bottom storage medium are different from each other, and the data access speed is sequentially decreased.
8. The method of claim 7, wherein said storing schedule the stored data according to the scheduling policy further comprises:
and performing data verification before and/or after transferring and storing the data so as to ensure the integrity of the data.
9. A data storage scheduling apparatus, comprising:
The monitoring module is used for monitoring the attribute of the stored data in the storage medium in real time; the attribute of the storage data comprises storage time, data type and data service type, wherein the data service type at least comprises monitoring data and AI prediction data; the data type is determined according to a data heat value of the stored data, the data heat value is determined according to a behavior time of the stored data, the data type comprises hot data, warm data and cold data, the data heat value of the hot data is in a preset first time range, the data heat value of the warm data is in a preset second time range, the data heat value of the cold data is in a preset third time range, and the stored data of different data types are stored in different storage devices; dividing preset data classification logic according to the data service type, wherein the preset data classification logic comprises a preset fourth time range, a preset fifth time range and a preset sixth time range, the preset fourth time range corresponds to the storage time of the hot data, the preset fifth time range corresponds to the storage time of the warm data, and the preset sixth time range corresponds to the storage time of the cold data; for the stored data of the same data type but different data service types, the corresponding preset data classification logic is different;
the type determining module is used for acquiring corresponding preset data classification logic according to the data type and the data service type of the stored data, and redetermining the data type of the stored data according to the preset data classification logic and the storage time of the stored data;
The result output module is used for determining a scheduling strategy of the stored data according to a preset storage scheduling logic and the redetermined data type; the scheduling strategy comprises a storage medium corresponding to storage data;
And the processing module is used for carrying out storage scheduling on the storage data according to the scheduling policy.
CN201910965867.3A 2019-10-10 2019-10-10 Data storage scheduling method and device Active CN110727406B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910965867.3A CN110727406B (en) 2019-10-10 2019-10-10 Data storage scheduling method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910965867.3A CN110727406B (en) 2019-10-10 2019-10-10 Data storage scheduling method and device

Publications (2)

Publication Number Publication Date
CN110727406A CN110727406A (en) 2020-01-24
CN110727406B true CN110727406B (en) 2024-05-17

Family

ID=69220977

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910965867.3A Active CN110727406B (en) 2019-10-10 2019-10-10 Data storage scheduling method and device

Country Status (1)

Country Link
CN (1) CN110727406B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114528326A (en) * 2020-10-30 2022-05-24 华为技术有限公司 Data management method and device
CN112711386B (en) * 2021-01-18 2021-07-16 深圳市龙信信息技术有限公司 Storage capacity detection method and device of storage device and readable storage medium
CN112732726B (en) * 2021-04-02 2022-04-29 云和恩墨(北京)信息技术有限公司 Data processing method and device, processor and computer storage medium
CN113900597A (en) * 2021-11-30 2022-01-07 深圳市安信达存储技术有限公司 Data storage method, system, equipment and storage medium
CN114201119B (en) * 2022-02-17 2022-05-13 天津市天河计算机技术有限公司 Hierarchical storage system and method for super computer operation data
CN115134239A (en) * 2022-08-31 2022-09-30 广州市千钧网络科技有限公司 Client configuration method, system, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106528002A (en) * 2016-12-06 2017-03-22 郑州云海信息技术有限公司 Time-based storage scheduling method
CN107193500A (en) * 2017-05-26 2017-09-22 郑州云海信息技术有限公司 A kind of distributed file system Bedding storage method and system
CN108563730A (en) * 2018-04-04 2018-09-21 北京蓝杞数据科技有限公司天津分公司 A kind of cold and hot data automatic switching method, device, electronic equipment and storage medium
KR20190061426A (en) * 2017-11-28 2019-06-05 성균관대학교산학협력단 Flash memory system and control method thereof
CN109919193A (en) * 2019-01-31 2019-06-21 中国科学院上海光学精密机械研究所 A kind of intelligent stage division, system and the terminal of big data
CN110134723A (en) * 2019-05-22 2019-08-16 网易(杭州)网络有限公司 A kind of method and database of storing data

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2506164A (en) * 2012-09-24 2014-03-26 Ibm Increased database performance via migration of data to faster storage
US10572863B2 (en) * 2015-01-30 2020-02-25 Splunk Inc. Systems and methods for managing allocation of machine data storage
KR20160111583A (en) * 2015-03-16 2016-09-27 삼성전자주식회사 Memory system including host and a plurality of storage device and data migration method thereof

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106528002A (en) * 2016-12-06 2017-03-22 郑州云海信息技术有限公司 Time-based storage scheduling method
CN107193500A (en) * 2017-05-26 2017-09-22 郑州云海信息技术有限公司 A kind of distributed file system Bedding storage method and system
KR20190061426A (en) * 2017-11-28 2019-06-05 성균관대학교산학협력단 Flash memory system and control method thereof
CN108563730A (en) * 2018-04-04 2018-09-21 北京蓝杞数据科技有限公司天津分公司 A kind of cold and hot data automatic switching method, device, electronic equipment and storage medium
CN109919193A (en) * 2019-01-31 2019-06-21 中国科学院上海光学精密机械研究所 A kind of intelligent stage division, system and the terminal of big data
CN110134723A (en) * 2019-05-22 2019-08-16 网易(杭州)网络有限公司 A kind of method and database of storing data

Also Published As

Publication number Publication date
CN110727406A (en) 2020-01-24

Similar Documents

Publication Publication Date Title
CN110727406B (en) Data storage scheduling method and device
CN104040481B (en) Method and system for merging, storing and retrieving incremental backup data
US7574435B2 (en) Hierarchical storage management of metadata
US11151030B1 (en) Method for prediction of the duration of garbage collection for backup storage systems
US9185188B1 (en) Method and system for determining optimal time period for data movement from source storage to target storage
US10061834B1 (en) Incremental out-of-place updates for datasets in data stores
WO2011159322A1 (en) Data deduplication
US20170123711A1 (en) Deduplicating data for a data storage system using similarity determinations
US10976942B2 (en) Versioning a configuration of data storage equipment
US20100174878A1 (en) Systems and Methods for Monitoring Archive Storage Condition and Preventing the Loss of Archived Data
CN104462389A (en) Method for implementing distributed file systems on basis of hierarchical storage
CN103605585A (en) Intelligent backup method based on data discovery
US11422721B2 (en) Data storage scheme switching in a distributed data storage system
US11223528B2 (en) Management of cloud-based shared content using predictive cost modeling
CN109947730A (en) Metadata restoration methods, device, distributed file system and readable storage medium storing program for executing
US10095738B1 (en) Dynamic assignment of logical partitions according to query predicate evaluations
CN113553325A (en) Synchronization method and system for aggregation objects in object storage system
CN112650453A (en) Method and system for storing and inquiring traffic data
US20210097026A1 (en) System and method for managing data using an enumerator
CN114153395B (en) Object storage data life cycle management method, device and equipment
EP3550451A1 (en) Data storage and maintenance method and device, and computer storage medium
CN115437997A (en) Intelligent identification optimization system for data life cycle
US10540329B2 (en) Dynamic data protection and distribution responsive to external information sources
US9898485B2 (en) Dynamic context-based data protection and distribution
Iwata et al. A simulation result of replicating data with another layout for reducing media exchange of cold storage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant