CN112379842A - Method and device for predicting cold and hot properties of data - Google Patents

Method and device for predicting cold and hot properties of data Download PDF

Info

Publication number
CN112379842A
CN112379842A CN202011295845.XA CN202011295845A CN112379842A CN 112379842 A CN112379842 A CN 112379842A CN 202011295845 A CN202011295845 A CN 202011295845A CN 112379842 A CN112379842 A CN 112379842A
Authority
CN
China
Prior art keywords
data
cold
hot
prediction model
characteristic information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011295845.XA
Other languages
Chinese (zh)
Inventor
黄朝松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Anjili New Technology Co ltd
Original Assignee
Shenzhen Anjili New Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Anjili New Technology Co ltd filed Critical Shenzhen Anjili New Technology Co ltd
Priority to CN202011295845.XA priority Critical patent/CN112379842A/en
Publication of CN112379842A publication Critical patent/CN112379842A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0616Improving the reliability of storage systems in relation to life time, e.g. increasing Mean Time Between Failures [MTBF]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application is applicable to the technical field of data storage, and provides a method and a device for predicting SSD cold and hot data, wherein the method comprises the following steps: determining characteristic information of each first data in a plurality of first data stored in a Solid State Disk (SSD) and a cold and hot attribute corresponding to the characteristic information of each first data; determining parameters of a first prediction model according to the characteristic information of each first data, the cold and hot properties corresponding to the characteristic information of each first data and a first function; and predicting the cold and hot properties of the second data according to the first prediction model and the characteristic information of the second data. The hot and cold properties of data written to the SSD by other devices can be predicted and the hot and cold data stored in different areas.

Description

Method and device for predicting cold and hot properties of data
Technical Field
The application belongs to the technical field of data storage, and particularly relates to a method and a device for predicting cold and hot properties of data in the technical field of storage.
Background
Data storage devices, such as Solid State Disks (SSDs), often read and write data during their daily use.
In order to prolong the service life of the SSD, the data with different cold and hot properties can be divided into partitions with different properties, the data stored in the high-performance partition has higher access times, the data stored in the low-performance partition has lower access times, the data with the higher access times are called hot data, and the data with the lower access times are called cold data, so that the service life of the SSD can be prolonged through differentiated storage. Therefore, when a new data is written into the SSD, it becomes an urgent problem to predict the hot and cold properties of the data.
Disclosure of Invention
The embodiment of the application provides a method and a device for predicting cold and hot properties of data, which can predict the cold and hot properties of data written into an SSD and store the data in different areas.
In a first aspect, an embodiment of the present application provides a method for predicting data hot and cold attributes, which is applied to an SSD, and includes: in one possible implementation manner, determining feature information of each first data in a plurality of first data stored in an SSD and a cold-hot property corresponding to the feature information of each first data; determining parameters of a first prediction model according to the characteristic information of each first data, the cold and hot properties corresponding to the characteristic information of each first data and a first function; and predicting the cold and hot properties of the second data according to the first prediction model and the characteristic information of the second data.
In some possible implementations, predicting the cold-hot property of the second data according to the first prediction model and the feature information of the second data includes:
performing linear regression on the first prediction model to obtain a second preset model, wherein the range of the output value of the second prediction model is [ M, N ];
and determining the cold and hot properties of the second data according to the characteristic information of the second data and the second prediction model.
In some possible implementations, the first prediction model is hθ(x)=θx=θ01x12x2+…+θnxn(ii) a The second predictive model is a model of,
Figure BDA0002785280710000021
Figure BDA0002785280710000022
wherein x is1、x2...xnFor inputting a characteristic value, theta, indicated by characteristic information of the second data0、θ1、θ2...θNIs a parameter of the first prediction model, n is of the characteristic valueNumber, theta ═ theta0、θ1、θ2...θn]Is theta0、θ1、θ2...θNMatrix of constituent parameters, thetaTIs the transpose of θ, x ═ x1、x2...xn]Is x1、x2...xnAnd (3) forming a feature vector, wherein M is 0 and N is 1.
In some possible implementations, determining the cold-hot property of the second data according to the characteristic information of the second data and the second prediction model includes:
determining a true cold-hot property of the data predicted by the first prediction model;
updating the first prediction model according to the real cold and hot properties of the data predicted by the first prediction model and the cold and hot properties predicted by the first prediction model
In some possible implementations, the first preset value is 0.5.
In some possible implementations, the first function is
Figure BDA0002785280710000023
Figure BDA0002785280710000031
Wherein minJ (θ) refers to a minimum value of J (θ), m is the number of the first data input to the first prediction model, x(i)Is the feature vector, y, of the ith one of the m first data(i)Is a value corresponding to the cold-hot property of the ith one of the m first data.
In some possible implementations, the marking information of each first data is determined according to the cold and hot attributes corresponding to the characteristic information of each first data in the at least one first data, and the marking information of each first data is used for marking the cold and hot attributes of each first data;
if the second data is marked with mark information, the mark information is used for marking the cold and hot attributes of the second data history;
and comparing the mark information of the second data with the mark information of at least part of the first data to determine the cold and hot properties of the second data, wherein the second data is different from each first data.
In some possible implementations, the true cold-hot property of the data predicted by the first predictive model is determined;
updating the first prediction model according to the real cold and hot properties of the data predicted by the first prediction model and the cold and hot properties predicted by the first prediction model.
In some possible implementations, the characteristic information is a write time of a data page and/or a physical address PDA of the data page and/or a logical address of the data page.
In a second aspect, an embodiment of the present application provides an apparatus for predicting a cold and hot property of data, including a determining module and a predicting module, configured to perform the method in any one of the possible implementation manners of the first aspect.
In a third aspect, an embodiment of the present application provides an SSD, which includes a controller and a memory. The processor is adapted to read and execute the computer program stored in the memory to perform the method of the first aspect and any possible implementation thereof.
In a fourth aspect, the present application provides a computer-readable storage medium on which a computer program (also referred to as instructions or codes) for implementing the method in the first aspect is stored.
In a fifth aspect, embodiments of the present application provide a computer program product comprising a computer program (also referred to as instructions or code), which when executed by a computer, causes the computer to implement the method of the first aspect.
Compared with the prior art, the embodiment of the application has the advantages that: according to the embodiment of the application, the first prediction model is established through the feature information and the cold and hot attributes of the first data stored in the SSD, the feature information of the second data can be input into the first prediction model to be prestored to obtain the cold and hot attributes of the second data, so that the SSD can directly store the data to be written in corresponding partitions according to the cold and hot attributes, the differential storage of the data with different cold and hot attributes can be realized, and the service life of the SSD is effectively prolonged.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 (a) is a schematic block diagram of an SSD provided by the embodiment of the present application
Fig. 1 (b) is a schematic diagram of a cold and hot partition of an SSD according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating an example of a method for predicting hot and cold attributes of data according to an embodiment of the present disclosure;
FIG. 3 is a schematic flow chart diagram illustrating a method for updating a first predictive model according to an embodiment of the present disclosure;
FIG. 4 is a flowchart illustrating a method for predicting hot and cold data attributes according to another embodiment of the present disclosure;
fig. 5 is a schematic diagram of an apparatus for predicting cold and hot properties of data according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
Fig. 1 is a schematic diagram of an SSD100 according to an embodiment of the present application, as shown in fig. 1 (a), the SSD100 includes a controller 101 and a memory 102, the controller 101 is configured to control the memory 102 to store data, the memory 102 is configured to store data, a frequency of reading and writing data stored in the memory 102 is high, a frequency of reading and writing data is low, in general, the data with the high reading and writing frequency is referred to as hot data, and the data with the low reading and writing frequency is referred to as cold data. To increase the lifetime of SSD100, hot data and fractional data are typically stored in partitions, hot data in high performance partitions and cold data in low performance partitions.
For example, the storage 102 may be a disk array (RAID) composed of a plurality of hard Disks.
Alternatively, the SSD may use a Flash Memory (Flash Memory) as a storage medium, and may also use a Dynamic Random Access Memory (DRAM) as a storage medium. This is not a limitation of the present application.
As shown in fig. 1 (b), the memory 102 includes a cold data area 1021 for storing cold data and a hot data area 1022 for storing hot data.
Alternatively, the memory 102 may have different data area division manners, for example, the memory 102 includes a cold data area 1021, a hot data area 1022, and a warm data area for storing warm data. The invention is not limited in this regard.
It should be appreciated that the number of data erasures stored in the cold data area 1021 is small; the number of data erasures stored in the hot data area is large.
Alternatively, the cold and hot attributes of the data may be divided into cold data and hot data; and may be divided into cold data, warm data and hot data. This is not a limitation of the present application.
Fig. 2 is a flowchart illustrating a method for predicting cold and hot properties of data according to an embodiment of the present application. The following method may be executed by an SSD or may be executed by a controller 101 in the SSD, and is described as an example in the SSD execution, and referring to fig. 2, a method for predicting data hot and cold attributes in an embodiment of the present application may include the following steps:
s201, the SSD determines the characteristic information of each first data in a plurality of first data stored in the SSD and the cold and hot attributes corresponding to the characteristic information of each first data.
Optionally, the characteristic information indicates at least one of a time when the characteristic data page is written into the SSD, a physical address PDA of the data page, and a logical address of the data page. One or more first data may form one data page, and the plurality of first data in S201 may correspond to one data page or may correspond to different data pages.
For example, if the data pages corresponding to the two first data are the same, the time for writing the two first data into the SSD, the physical address PDA of the data page, and the logical address of the data page are also the same; in addition, the two first data may not have the same characteristic indicated by the characteristic information of the two data except the time of writing to the SSD, the physical address PDA of the data page, and the logical address of the data page, and the type of the data itself, for example, the type of the data is a video type data, a database type data, or a text type data.
It should be understood that if the characteristics indicated by the characteristic information include logical addresses of data pages, the logical addresses may be translated to physical addresses by the SSD.
Specifically, in some possible implementations, in the SSD memory 102, the plurality of first data that have been stored, but the plurality of first data are not divided into corresponding hot and cold data regions, the hot and cold attributes of each first data may be obtained by counting the number of times of reading and/or writing the plurality of first data stored in the memory 102, or the hot and cold attributes of each first data may be obtained by counting the number of times of reading or writing the data page where each first data is located; the first data with high read-write frequency is determined as hot data, and the first data with low read-write frequency is determined as cold data.
For example, the number of times of reading first data in a preset time period is greater than a preset number of times, and the first data is thermal data; and reading the first data within the preset time length for a time less than the preset time, wherein the first data is cold data.
The SSD determines feature information of the first data and a cold-hot property of the first data to form a training set, and training the first prediction model according to the training set is described below, through S201.
Alternatively, the plurality of first data may be all data stored in the SSD, or may be part of data stored in the SSD. The user can set according to the actual demand of the user. This is not a limitation of the present application.
S202, determining parameters of the first prediction model by the SSD according to the characteristic information of each first data, the cold and hot property corresponding to the characteristic information of each first data and the first function.
Optionally, the first function is preset.
Alternatively, the first prediction model may be of the form hθ(x)=θx=θ01x12x2+…+θnxn
Optionally, the first function is
Figure BDA0002785280710000091
Figure BDA0002785280710000092
Where m is the number of first data input to the first prediction model, x(i)Is a feature vector of the i-th first data of the m first data, y(i)Is the true value, x, of the output variable of the ith first data of the m first data1、x2...xnA characteristic value, theta, indicated for characteristic information of the second data0、θ1、θ2...θnIs a parameter of the first prediction model, n is the number of eigenvalues, θ ═ θ0、θ1、θ2...θn]Is a matrix of first prediction model parameters, x ═ x1、x2...xn]Is a feature vector of feature values.
Specifically, the parameters of the first pre-set model are not known before S202, e.g., the first pre-set modelMeasuring theta in the model0、θ1、θ2...θn]Is unknown; the first function is used to determine parameters of the first prediction model, and the physical meaning of the first function is: when the known characteristic information of a plurality of first data is substituted into the first prediction model, the function with the minimum error of the value of the real cold and hot property of the first data is substituted into the first prediction model, and when J (theta) is taken as the minimum value, the unknown quantity theta of the first function is a parameter of the first prediction model, and the unknown quantity theta of the first function is solved to determine the first prediction model.
Optionally, the true hot property of the first data may be assigned a, the true cold property of the first data may be assigned B, and y in the first function(i)The value of (d) is the true thermal property of the first data; for example, A, B is a positive real number, a ═ 1, and B ═ 0.
S203, the SSD predicts the cold and hot properties of the second data according to the first prediction model and the characteristic information of the second data.
Alternatively, the second data may be data written into the SSD by any other device.
For example, if the second data has three characteristic information, the first prediction model has a form hθ(x)=θx=θ01x12x23x3First characteristic value x of second data1The first characteristic parameter is theta for the number of reads of the data page in which the second data is located1(ii) a Second characteristic value x of second data2The second characteristic parameter is theta2(ii) a Third characteristic value x of second data3Is a characteristic value in the second data, and the third characteristic parameter is theta3. The SSD inputs the three feature information of the second data into the first prediction model, may obtain an output value, and may determine a cold-hot property of the second data according to the output value.
Alternatively, the true hot property of the first data may be assigned value a and the true cold property of the first data may be assigned value B. And if the output value of the second prediction model is close to A, the second data is hot data, and if the output value is close to B, the second data is cold data.
Alternatively, the first prediction model may be input with the feature information of the second data directly to an output value in the first prediction model without transformation; alternatively, the first predictive model may be subjected to a linear regression process, or other forms of transformation of the first predictive model. The following description will take an example that linear regression processing may be performed on the first prediction model to obtain a second prediction model, and the SSD may predict the second data according to the second prediction model.
In some possible implementations, the data can be, but is not limited to, transmitted through
Figure BDA0002785280710000111
Figure BDA0002785280710000112
Performing S-shaped transformation on the first prediction model to obtain a second prediction model, wherein the output value range of the second prediction model is [ M, N ]]。
Where G, N, M is any positive real number.
Optionally, the logic function for transforming the first prediction model is
Figure BDA0002785280710000113
The second prediction model obtained after the transformation of the first prediction model is
Figure BDA0002785280710000114
It should be appreciated that the second predictive model
Figure BDA0002785280710000115
Has an output value range of [0, 1]]。
If the second prediction model is
Figure BDA0002785280710000116
The SSD predicts the second data according to a second prediction model, comprising:
inputting the feature vector x of the second data into a second prediction modeModel (III)
Figure BDA0002785280710000117
An output value is obtained. Wherein, thetaTIs the transpose of theta.
And if the output value is greater than the first preset value Q, determining that the second data is thermal data.
Optionally, the SSD writes second data determined to be hot data to the hot data area 1022.
And if the output value is less than or equal to the first preset value Q, determining the second data as cold data.
Alternatively, the SSD writes the second data determined to be cold data to the cold data area 1021.
In some possible implementation manners, if the output value is greater than a first preset value Q, determining that the second data is thermal data; if the output value is equal to the first preset value Q, determining the second data as temperature data; and if the output value is smaller than the first preset value Q, determining that the second data is cold data.
Optionally, the first preset value Q is 0.5.
It should be noted that the first preset value of 0.5 is for illustration and not for limitation, and the first preset value may be any real number greater than M and less than N in the output value range [0, 1] of the second prediction model. This is not a limitation of the present application.
Alternatively, the first prediction model may not be updated, may be updated periodically, or may be updated when a trigger condition is satisfied, and the updating of the first prediction model based on the satisfaction of the trigger condition is described below.
Fig. 3 depicts an update method for the first predictive model. Hereinafter, a method of updating the first prediction model will be described.
After the SSD may predict the plurality of second data according to S203, the plurality of second data may update the first prediction model according to a prediction result of the plurality of second data.
And S301, starting.
S302, the SSD counts and predicts real cold and hot attributes of a plurality of second data according to the S203.
And S303, the SSD compares the real cold and hot attributes of the plurality of second data with the cold and hot attributes of the plurality of second data predicted in S203, counts the number of prediction errors, and executes S305 if the number of prediction errors is greater than or equal to the threshold lambda of the number of prediction errors, otherwise executes S304.
It is understood that the number threshold λ is a value preset by the SSD.
S304, the SSD counts whether the total number of times of the hot and cold attributes of the plurality of second data predicted in S230 is greater than or equal to the predicted number of times threshold c. If so, S305 is performed, otherwise, S301 is performed.
As an alternative manner of S303 and S304, the SSD may count the predicted success rate, and if the cold-hot property of a certain second data predicted by the SSD is the same as the counted true cold-hot property, it indicates that the second data is predicted successfully; and if the cold and hot attributes of certain second data predicted by the SSD are different from the counted real cold and hot attributes, the second data prediction is failed. The SSD determines a success rate of the prediction according to a ratio of the number of the successfully predicted second data to the total number of the predicted second data, and if the success rate is greater than a preset value, S301 is performed, otherwise, S305 is performed.
S305, the SSD updates the first predictive model.
Specifically, the SSD updates the first prediction model according to the true cold-hot property of the plurality of second data counted in S302. And if the predicted cold and hot attributes of one second data are the same as the real cold and hot attributes, the second data can be used as a sample in the training set to update the first prediction model, otherwise, the second data cannot be used as a sample in the training set.
It is to be understood that updating the first predictive model may be performed only at S301-S303 and S305, or only at S301-S302, S304-S305, or only at S301-S305.
Fig. 4 provides another exemplary flowchart of a method for predicting hot and cold attributes of data, which describes a method 400 for predicting hot and cold attributes of second data by an SSD.
S401, start.
S402, the SSD receives the command of writing the second data, and judges whether the second data has the mark information. If the second data does not have the tag information, S403 shown in fig. 4 is performed, otherwise S404 is performed.
Optionally, the tagging information is for tagging a cold-hot property of the second data.
And S403, the SSD predicts the cold and hot properties of the second data according to the first prediction model or the second prediction model. Specifically, the SSD predicts the cold-hot property of the second data according to the first prediction model or the second prediction model in S403 refers to the description of the embodiment of fig. 2.
After S403, S404, the SSD may store the second data in the corresponding cold and hot data area according to the predicted cold and hot property of the second data, and store the second data in the hot data area if the second data is predicted to be hot data; and storing the second data in the cold data area if the second data is predicted to be cold data.
S405, if the second data has the mark information, the SSD determines the cold and hot attributes of the second data according to the mark information of the second data.
It is to be understood that the second data may be history data of the SSD, and the flag information of the second data is SSD flag. The SSD may mark the hot and cold attributes of each first data. In other words, the SSD can determine the hot and cold property of the first data according to the flag information of the first data, and determine the hot and cold property of the second data according to the flag information of the second data.
Specifically, S405 specifically includes: the SSD compares the mark information of the second data with the mark information of at least one first data, the mark information of one first data is the same as the mark information of the second data, if the first data belongs to hot data, the SSD determines that the second data also belongs to hot data, and if the first data is cold data, the SSD determines that the second data is also cold data.
After S405, S406, the SSD may store the second data in the corresponding cold and hot data area according to the predicted cold and hot property of the second data, and store the second data in the hot data area if the second data is predicted to be hot data; and storing the second data in the cold data area if the second data is predicted to be cold data.
It should be understood that the SSD may not perform S402 and directly perform S404 regardless of whether the second data carries the flag information, in other words, it is not necessary to determine whether the second data carries the flag information.
It should be understood that the solutions obtained by other possible variations, simply by substituting the solutions proposed in the embodiments of the present application, will be obvious to those skilled in the art. This is not a limitation of the present application.
As shown in fig. 5, an embodiment of the present application further provides an apparatus 500 for predicting data cold and hot properties, including:
a determining module 501, configured to determine feature information of each first data in a plurality of first data stored in a solid state disk SSD and a cold-hot property corresponding to the feature information of each first data;
and the predicting module 502 is used for predicting the cold and hot properties of the second data according to the first prediction model and the characteristic information of the second data.
Alternatively, the determination model 501 and the prediction model 502 in the apparatus 500 may be run in the controller 101 shown in fig. 1.
The present application also provides a computer readable storage medium, on which a computer program (also referred to as instructions or codes) for implementing the method in the above embodiments is stored.
Embodiments of the present application also provide a computer program product, which includes a computer program (also referred to as instructions or code), and when the computer program is executed by a computer, the computer realizes the method in the above embodiments.
It should be noted that the apparatus and the method for predicting data hot and cold properties in the above embodiments are based on the same inventive concept. Therefore, specific functions in each functional module in the SSD may have corresponding method steps, which are not described herein again.
The above are only preferred embodiments of the present application and do not limit the scope of the claims of the present application. All equivalent process changes made by using the contents of the specification and the drawings of the present application, or applied directly or indirectly to other related technical fields, are included in the scope of protection of the present application.

Claims (11)

1. A method for predicting data cold and hot properties, comprising:
determining characteristic information of each first data in a plurality of first data stored in a Solid State Disk (SSD) and a cold and hot attribute corresponding to the characteristic information of each first data;
determining parameters of a first prediction model according to the characteristic information of each first data, the cold and hot properties corresponding to the characteristic information of each first data and a first function;
and predicting the cold and hot properties of the second data according to the first prediction model and the characteristic information of the second data.
2. The method of claim 1, wherein said predicting cold and hot attributes of second data based on said first predictive model and characteristic information of said second data comprises:
performing linear regression on the first prediction model to obtain a second preset model, wherein the range of the output value of the second prediction model is [ M, N ];
and determining the cold and hot properties of the second data according to the characteristic information of the second data and the second prediction model.
3. The method of claim 2, wherein the first predictive model is hθ(x)=θx=θ01x12x2+…+θnxnAnd the second prediction model is a model of,
Figure FDA0002785280700000011
wherein x is1、x2...xnFor inputting a characteristic value, theta, indicated by characteristic information of the second data0、θ1、θ2...θNIs a parameter of the first prediction model, n is the number of feature values,θ=[θ0、θ1、θ2...θn]is theta0、θ1、θ2...θNMatrix of constituent parameters, thetaTIs the transpose of θ, x ═ x1、x2...xn]Is x1、x2...xnAnd (3) forming a feature vector, wherein M is 0 and N is 1.
4. The method of claim 3, wherein determining the cold-hot property of the second data based on the characteristic information of the second data and the second predictive model comprises:
inputting the characteristic information of second data into the second prediction model to obtain an output value corresponding to the cold and hot attributes of the second data;
if the output value is larger than a first preset value, determining that the second data is thermal data;
and if the output value is smaller than or equal to the first preset value, determining that the second data is cold data.
5. The method of claim 4, wherein the first predetermined value is 0.5.
6. The method of any one of claims 1 to 5, the first function being
Figure FDA0002785280700000021
Wherein m is the number of the at least one first data, x(i)Is a feature vector, y, of the ith one of the m first data(i)Is a value corresponding to the cold-hot property of the ith one of the m first data.
7. The method of any of claims 1 to 5, further comprising:
determining the marking information of each first data according to the cold and hot attributes corresponding to the characteristic information of each first data in the at least one first data, wherein the marking information of each first data is used for marking the cold and hot attributes of each first data;
if the second data is marked with mark information, the mark information is used for marking the cold and hot attributes of the second data history;
and comparing the mark information of the second data with the mark information of at least part of the first data to determine the cold and hot properties of the second data, wherein the second data is different from each first data.
8. The method of any of claims 1 to 5, further comprising:
determining a true cold-hot property of the data predicted by the first prediction model;
updating the first prediction model according to the real cold and hot properties of the data predicted by the first prediction model and the cold and hot properties predicted by the first prediction model.
9. Method according to claim 8, characterized in that the characteristic information is the writing time of a data page and/or the physical address PDA of the data page and/or the logical address of the data page.
10. An apparatus for predicting cold and hot properties of data, comprising means for performing the method of any one of claims 1 to 9.
11. A solid state disk, SSD, characterized in that it comprises the apparatus according to claim 10.
CN202011295845.XA 2020-11-18 2020-11-18 Method and device for predicting cold and hot properties of data Pending CN112379842A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011295845.XA CN112379842A (en) 2020-11-18 2020-11-18 Method and device for predicting cold and hot properties of data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011295845.XA CN112379842A (en) 2020-11-18 2020-11-18 Method and device for predicting cold and hot properties of data

Publications (1)

Publication Number Publication Date
CN112379842A true CN112379842A (en) 2021-02-19

Family

ID=74584284

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011295845.XA Pending CN112379842A (en) 2020-11-18 2020-11-18 Method and device for predicting cold and hot properties of data

Country Status (1)

Country Link
CN (1) CN112379842A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023030227A1 (en) * 2021-08-31 2023-03-09 华为技术有限公司 Data processing method, apparatus and system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103703450A (en) * 2011-07-20 2014-04-02 华为技术有限公司 Method and apparatus for SSD storage access
CN105183386A (en) * 2015-09-14 2015-12-23 联想(北京)有限公司 Information processing method and electronic equipment
KR101686346B1 (en) * 2015-09-11 2016-12-29 성균관대학교산학협력단 Cold data eviction method using node congestion probability for hdfs based on hybrid ssd
CN107728952A (en) * 2017-10-31 2018-02-23 郑州云海信息技术有限公司 A kind of prediction type data migration method and system
CN109033298A (en) * 2018-07-14 2018-12-18 北方工业大学 Data distribution method under heterogeneous HDFS cluster
CN109189693A (en) * 2018-07-18 2019-01-11 深圳大普微电子科技有限公司 The method and SSD that a kind of pair of LBA information is predicted
CN109739646A (en) * 2018-12-28 2019-05-10 北京神州绿盟信息安全科技股份有限公司 A kind of data processing method and device
CN111090392A (en) * 2019-11-20 2020-05-01 深圳市得一微电子有限责任公司 Cold and hot data separation method based on feature codes
CN111124303A (en) * 2019-12-18 2020-05-08 北京易华录信息技术股份有限公司 Data storage method, device and system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103703450A (en) * 2011-07-20 2014-04-02 华为技术有限公司 Method and apparatus for SSD storage access
KR101686346B1 (en) * 2015-09-11 2016-12-29 성균관대학교산학협력단 Cold data eviction method using node congestion probability for hdfs based on hybrid ssd
CN105183386A (en) * 2015-09-14 2015-12-23 联想(北京)有限公司 Information processing method and electronic equipment
CN107728952A (en) * 2017-10-31 2018-02-23 郑州云海信息技术有限公司 A kind of prediction type data migration method and system
CN109033298A (en) * 2018-07-14 2018-12-18 北方工业大学 Data distribution method under heterogeneous HDFS cluster
CN109189693A (en) * 2018-07-18 2019-01-11 深圳大普微电子科技有限公司 The method and SSD that a kind of pair of LBA information is predicted
CN109739646A (en) * 2018-12-28 2019-05-10 北京神州绿盟信息安全科技股份有限公司 A kind of data processing method and device
CN111090392A (en) * 2019-11-20 2020-05-01 深圳市得一微电子有限责任公司 Cold and hot data separation method based on feature codes
CN111124303A (en) * 2019-12-18 2020-05-08 北京易华录信息技术股份有限公司 Data storage method, device and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023030227A1 (en) * 2021-08-31 2023-03-09 华为技术有限公司 Data processing method, apparatus and system

Similar Documents

Publication Publication Date Title
CN113015965B (en) Performing hybrid wear-leveling operations based on a small write counter
CN111143243B (en) Cache prefetching method and system based on NVM hybrid memory
CN111324303B (en) SSD garbage recycling method, SSD garbage recycling device, computer equipment and storage medium
CN110673789B (en) Metadata storage management method, device, equipment and storage medium of solid state disk
CN107179880B (en) Storage device, control unit thereof and data moving method for storage device
CN112035061B (en) Solid state disk resource allocation method, device and storage medium
CN109992210B (en) Data storage method and device and electronic equipment
CN112181902B (en) Database storage method and device and electronic equipment
JP6167646B2 (en) Information processing apparatus, control circuit, control program, and control method
CN113625973B (en) Data writing method, device, electronic equipment and computer readable storage medium
US20220374169A1 (en) Machine Learning Assisted Quality of Service (QoS) for Solid State Drives
CN115963995A (en) Multi-mode low-energy-consumption distributed cloud storage system, electronic equipment and storage medium
WO2017045500A1 (en) Storage array management method and apparatus
CN112379842A (en) Method and device for predicting cold and hot properties of data
Shafaei et al. Write Amplification Reduction in {Flash-Based}{SSDs} Through {Extent-Based} Temperature Identification
US20200004636A1 (en) Data Storage System with Strategic Contention Avoidance
US20110107056A1 (en) Method for determining data correlation and a data processing method for a memory
CN112802529A (en) Detection method and device for military-grade Nand flash memory, electronic equipment and storage medium
KR20050076156A (en) Data recovery device and method thereof
CN114281251B (en) Data distribution and reprogramming optimization method for 3D TLC flash memory
CN115344198A (en) Data reading and writing method and system of magnetic disk, terminal device and storage medium
CN113467724B (en) CRC (cyclic redundancy check) code storage method, device, equipment and medium
US20210200477A1 (en) Storage device configured to support multi-streams and operation method thereof
CN108984117B (en) Data reading and writing method, medium and equipment
US9846653B2 (en) Performing write operations on main memory

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination