CN109947373B - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN109947373B
CN109947373B CN201910246043.0A CN201910246043A CN109947373B CN 109947373 B CN109947373 B CN 109947373B CN 201910246043 A CN201910246043 A CN 201910246043A CN 109947373 B CN109947373 B CN 109947373B
Authority
CN
China
Prior art keywords
data
type
local
frequency
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910246043.0A
Other languages
Chinese (zh)
Other versions
CN109947373A (en
Inventor
刘爱贵
陈彬彬
阮薛平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dadao Yunxing Technology Co ltd
Original Assignee
Beijing Dadao Yunxing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dadao Yunxing Technology Co ltd filed Critical Beijing Dadao Yunxing Technology Co ltd
Priority to CN201910246043.0A priority Critical patent/CN109947373B/en
Publication of CN109947373A publication Critical patent/CN109947373A/en
Application granted granted Critical
Publication of CN109947373B publication Critical patent/CN109947373B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a data processing method and device. The data processing method comprises the following steps: determining first type data in the local data, wherein the first type data comprises cold data and/or accumulated data, and the accumulated data is data stored for more than a preset time; migrating the first type of data from the local storage to the cloud storage. According to the method and the device, the first type of data is uploaded to the cloud storage, so that the storage space of the local storage can be saved, the performance of the system can be improved, and the user experience is improved.

Description

Data processing method and device
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data processing method and apparatus.
Background
With the rapid development of social informatization, such as the internet, the mobile internet and the like, human behaviors are urging to generate a large amount of data every day, and the human society has already stepped into the mass data era.
However, the large amount of data may make the storage resources of the system relatively insufficient, which may cause a problem of system performance degradation.
Disclosure of Invention
An object of the embodiments of the present application is to provide a data processing method and apparatus, so as to solve the problem of system performance degradation caused by storage of a large amount of data in the prior art.
In view of this, in a first aspect, an embodiment of the present application provides a data processing method, where the method includes: determining first type data in the local data, wherein the first type data comprises cold data and/or accumulated data, and the accumulated data is data with the storage time exceeding a preset time length; migrating the first type of data from the local storage to the cloud storage.
Therefore, the first type of data is uploaded to the cloud storage, so that the storage space of the local storage can be saved, the performance of the system can be improved, and the user experience can be improved.
In one embodiment, determining a first type of data in the local data comprises: according to the metadata, the first type of data in the local data is determined, and the metadata records the attribute information of the data.
Therefore, the first type data is quickly searched from the local mass data through the metadata.
In one embodiment, the determining the first type of data in the local data according to the metadata includes: determining the read frequency or the write frequency of the local data according to the attribute information, wherein the read frequency is the read frequency of the local data within a preset time, and the write frequency is the write frequency of the local data within the preset time; determining the local data as cold data under the condition that the reading times are less than the preset reading times; or, determining the local data as cold data when the number of writing times is less than the preset number of writing times.
Therefore, the metadata is used for quickly determining cold data from the local mass data.
In one embodiment, the first type of data is accumulated data, the attribute information includes creation time of the data, and determining the first type of data in the local data includes: determining the creation time of the local data according to the attribute information; and determining the local data as accumulated data under the condition that the interval time between the creation time and the current time is greater than a preset time period.
Thus, the migration of accumulated data may archive files created earlier, thereby conserving local storage space.
In one embodiment, the attribute information includes a migration identifier, where the migration identifier is used to indicate whether the local data is migrated to the cloud storage, and the data processing method further includes: after the migration of the first type of data is completed, the migration identifier corresponding to the first type of data is set to be a first value, and the first value indicates that the first type of data has been migrated to the cloud storage.
Therefore, migrated data in the local data is recorded through the migration identifier, so that management of the migrated data is facilitated.
In one embodiment, the attribute information includes a size of the data and path information of the data in the cloud storage, and the data processing method further includes: after the migration of the first type data is completed, the first type data is deleted in the local storage, and the size and path information of the first type data are recorded in the metadata.
Therefore, the state in the local data is recorded by updating the migration identifier, thereby facilitating management of the local data.
In one embodiment, the data processing method further comprises: acquiring a data access request of first-class data; and downloading the first type data from the cloud storage to the local storage according to the path information of the first type data recorded by the metadata in the cloud storage.
Therefore, the data migration is transparent to the user, and the migration is only waited to be completed when the migration is triggered, so that transparent access is realized.
In one embodiment, the attribute information includes a rollback flag, where the rollback flag is used to indicate whether the first type of data has been migrated back to the local storage, and the data processing method further includes: after the first type of data is migrated back, setting a migration back identifier corresponding to the first type of data as a second value, wherein the second value is used for representing that the first type of data is migrated back to the local memory.
Therefore, the migration state of the first type data is recorded through the migration identifier, so that the migrated first type data is convenient to manage.
In one embodiment, the data processing method further comprises: determining the first type of data which is migrated back according to the attribute information; and deleting the first type of data which is migrated back in the local storage, and setting a migration identifier corresponding to the first type of data which is migrated back as a third value, wherein the third value is used for indicating that the first type of data is not migrated back to the local storage.
Therefore, excessive local storage is avoided by cleaning the migrated data.
In one embodiment, the metadata includes base metadata and extended metadata, the base metadata being metadata stored in a database, and the extended metadata being metadata stored in an extended attribute of a file.
In a second aspect, the present application provides a data processing apparatus comprising: the determining module is used for determining first type data in the local data, the first type data comprises cold data and/or accumulated data, and the accumulated data is data with the storage time exceeding a preset time length; and the migration module is used for migrating the first type of data from the local storage to the cloud storage.
Therefore, the first type of data is uploaded to the cloud storage, so that the storage space of the local storage can be saved, the performance of the system can be improved, and the user experience can be improved.
In a third aspect, the present application provides a computer readable storage medium having stored thereon a data processing program which, when executed by a processor, implements the steps of the data processing method according to any one of the first aspect.
In a fourth aspect, the present application provides an electronic device, comprising: a processor, a memory and a bus, the memory storing processor-executable machine-readable instructions, the processor and the memory communicating via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the method of the first aspect or any of the alternative implementations of the first aspect.
In a fifth aspect, the present application provides a computer program product which, when run on a computer, causes the computer to perform the method of the first aspect or any possible implementation manner of the first aspect.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic diagram of an operating mode of a data processing method and apparatus according to an embodiment of the present application;
FIG. 2 is a block diagram of a processor according to an embodiment of the present application;
FIG. 3 is a flow chart of a first embodiment of a data processing method provided by the present application;
FIG. 4 is a flow chart of a second embodiment of a data processing method provided by the present application;
FIG. 5 is a flow chart of a third embodiment of a data processing method provided by the present application;
FIG. 6 is a flow chart of a fourth embodiment of a data processing method provided by the present application;
fig. 7 is a flowchart of a fifth embodiment of a data processing method provided in the present application;
fig. 8 is a flowchart of a sixth embodiment of a data processing method provided in the present application;
fig. 9 is a flowchart of a seventh embodiment of a data processing method provided in the present application;
fig. 10 is a flowchart of an eighth embodiment of a data processing method provided in the present application;
fig. 11 is a flowchart of a ninth embodiment of a data processing method provided in the present application;
FIG. 12 is a block diagram of a data processing apparatus provided herein;
fig. 13 is a block diagram of a device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
According to the research report of international famous data survey company IDC, the world has formally entered the era of ZB (Zettabyte, Zebra byte, 1ZB is equivalent to one billion 1TB hard disk capacity) in 2010, the global data volume reaches 4.4ZB in 2013, and the global data volume reaches 44ZB in 2020, and the average annual growth rate is as high as 40%. These data are all sufficient to show that human society has stepped into the era of massive amounts of big data.
At present, data is classified into cold data and hot data in terms of data utilization and access frequency thereof. Wherein, the hot data refers to data which needs to be accessed frequently and called frequently, the cold data refers to data which basically does not access and call, and the cold data accounts for 80-85% of the total amount of data, and is the largest data in the amount of data.
It should be understood that the hot data may also be referred to as active data, and correspondingly, the cold data may be referred to as inactive data, which is not limited by this application.
The user can adjust the performance of the system to the best for the hot data, reducing the latency of the system while ensuring that it is accessible to all users who have made access requests. On the contrary, cold data means that the data state is not active, and the quantity of the cold data is larger and larger as time goes on, but the cold data still contains huge value and can be frequently reused under certain conditions, so that the cold data still needs to be stored for a long time.
However, the large amount of data stored locally may cause the storage resource of the system to be relatively insufficient, which may cause the problems of system performance degradation, poor user experience, and the like.
Referring to fig. 1, fig. 1 is a schematic diagram of an operating mode of a data processing method and apparatus according to an embodiment of the present disclosure. The server 120 sends instructions to the second client 130 to cause the second client 130 to query the local storage of the first client 110 for data that satisfies the first type of data condition, and the second client 130 returns the query result to the server 120. The server 120 determines the first type of data in the local storage of the first client 110 according to the query result, and uploads the first type of data in the local storage of the first client 110 to the cloud storage 140 through the second client 130.
Therefore, according to the present application, through the operation mode shown in fig. 1, the first type data in the first client 110 can be stored in the cloud storage 140 unknowingly by the user operating the first client 110, and when the uploaded first type data needs to be accessed, the first type data can be downloaded from the cloud storage 140 to the first client 110, so that transparent archiving can be achieved.
It should be understood that the cloud storage 140 may also be referred to as a cloud storage system, a cloud storage, a network server, and the like, which is not limited in this application.
It should be understood that the cloud storage 140 may also be configured to store specific types of cloud storage according to actual requirements. For example, cloud storage 140 may be a blu-ray storage, for example, cloud storage 140 may also be a hundred degree cloud disk, for example, cloud storage 140 may also be a hua be a mesh disk, which is not limited in this application.
In addition, the cloud Storage 140 may implement Storage of first type data such as cold data and cumulative data, may provide an object or an NAS (Network Attached Storage) Storage Service to the outside, and may support an S3(S3Simple Storage Service) interface, a CIFS (Common Internet File System) interface, an NFS (Network File System) interface, and the like, so that the second client 130 and the first client 110 may communicate with the cloud Storage 140 through the above interfaces.
Further, the first type of data includes cold data and/or accumulated data, the accumulated data being data stored for more than a predetermined period of time, for example, the accumulated data being data stored in the first client 110 for more than three months.
It should be understood that the accumulated data may also be referred to as historical data, retention data, etc., and the application is not limited thereto.
It should be understood that the first type of data may be selected according to actual requirements, for example, the first type of data may also include data selected by a user, which is not limited in this application.
In addition, the arrows in fig. 1 indicate the transmission direction of the data flow. For example, a double-headed arrow between the first client 110 and the server 120 indicates that one of the two can read and write data of the other, and a single-headed arrow between the second client 130 and the server 120 indicates that the second client 130 can only read data in the server 120.
It should be understood that although the transmission direction of the data stream is specifically illustrated in fig. 1, the setting can be performed by those skilled in the art according to actual requirements. For example, the server 120 may read data in the second client 130, and for example, the second client 130 may download data from the cloud storage 140, which is not limited in this application.
It should be understood that the server 120 may be a web server, a database server, or the like. In addition, the first client 110 may also be a Personal Computer (PC), a tablet PC, a smart phone, a Personal Digital Assistant (PDA), a vehicle-mounted device, a wearable device, or the like. In addition, the second client 130 may also be a personal computer, a tablet computer, a smart phone, etc.
In addition, the server 120 includes an access interface, so that the server 120 can externally provide a distributed storage service through the access interface, and the distributed storage service includes file storage and object storage, thereby implementing a thermal data storage function.
In addition, the server 120 further includes a processor, which may send a query instruction to the second client 130, receive a query result returned by the second client 130, and perform data migration according to the query result, which is not limited in this application.
In addition, with continued reference to fig. 2, fig. 2 is a block diagram of a processor according to an embodiment of the present application. The processor includes a Web service, a parsing module, a volume object, a Peer object, a log module, and a DB (Database) module.
Where Web services are available to issue instructions or requests. For example, the Web service may send a volume query request to a local volume residing in the first client 110, may also send a volume configuration request to a local volume residing in the first client 110, may also send a directory upload request, and may also send a file location request.
It should be understood that the volume in first client 110 may also be referred to as a memory, and this application is not limited thereto.
It should be understood that although four requests of the Web service are specifically illustrated in fig. 2, the request of the Web service may be specifically set by those skilled in the art according to actual needs, and the present application is not limited thereto.
The parsing module may be used to parse information related to the local volume in the first client 110, e.g., the parsing module may parse cluster information (e.g., configuration information of the cluster) and may parse volume information (e.g., configuration information of the volume).
It should be understood that although fig. 2 illustrates two kinds of analysis information of the analysis module, those skilled in the art can set the specific analysis information of the analysis module according to actual needs, and the present application is not limited thereto.
The volume object may be used to manage information for the volume. For example, the volume object may include a task scheduling module for implementing task scheduling, may further include a file scanning module for implementing file scanning, may further include a file cleaning module for implementing file cleaning, may further include a file uploading module for implementing file uploading, may further include a state machine module for implementing storage of a plurality of files, and may further include a volume status monitoring module capable of monitoring volume information.
It should be understood that although six types of modules of the volume object are specifically illustrated in fig. 2, those skilled in the art may also set the specific type of the modules of the volume object according to actual requirements, and the present application is not limited thereto.
The Peer object may be a server object in a management cluster. For example, the Peer object may include a cluster monitoring module for monitoring cluster status.
It should be understood that although fig. 2 specifically illustrates a module of a Peer object, those skilled in the art can also set the specific type of the module of the Peer object according to actual requirements, and the present application is not limited thereto.
The logging module may be used to log.
The DB module may be used to record operation records, metadata, and the like.
It should be understood that although fig. 2 shows the parts of the processor which make up the framework, those skilled in the art can also set the method according to actual requirements, and the present application is not limited to this.
With continuing reference to fig. 3, fig. 3 is a flow chart of a first embodiment of a data processing method provided herein.
The data processing method of the embodiment of the application comprises the following steps:
in step 301, a first type of data in the local data is determined, wherein the first type of data comprises cold data and/or accumulated data, and the accumulated data is data stored for a time longer than a predetermined time.
It should be understood that the local data may be referred to as a local file, a local data set, and the like. Correspondingly, the first type of data may also be referred to as a first type of file, a first data set, and the like, which is not limited in this application.
In this step, the second client 130 may obtain the query instruction issued by the server 120, and then the second client 130 initiates a query request to the database in the server 120 according to the query instruction to obtain a query result, and feeds the query result back to the server 120. The server 120 determines the data in the first client 110 that meets the first type of data condition according to the query result.
Alternatively, the first client 110 may obtain the query instruction issued by the server 120, and then the first client 110 may determine the data that is stored by itself and meets the first type of data condition through its own monitoring process, and send the monitoring result to the server 120. The server 120 determines the data in the first client 110 that meets the first type of data condition according to the monitoring result.
Step 302, migrating the first type of data from the local storage to the cloud storage 140.
In this step, the server 120 obtains information of the first type of data stored in the first client 110 according to the query result, where the information of the first type of data may include a size and/or a storage location of the data. Subsequently, the server 120 gives the location information of the first type data on the cloud storage 140 according to the information of the first type data, where the location information includes an address of the cloud storage 140, a storage location of the first type data in the cloud storage 140, and the like. Subsequently, the server 120 migrates the first type data from the local storage to the cloud storage 140 according to the location information.
In addition, the migration process of the first type data may be parallel uploading of a plurality of first type data, or may be uploading of a plurality of first type data sequentially in an order manner, which is not limited in this application.
In addition, in the migration process of the first type of data, the first type of data can be sliced under the condition that the first type of data is large, so that the rapid migration is realized.
In addition, although the embodiment specifically defines the manner in which the server 120 performs step 301 and step 302, those skilled in the art may also adjust the method according to actual requirements. For example, the steps 301 and 302 may be executed by a processor in the server 120, which is not limited in this application.
In this embodiment, the first type of data is uploaded to the cloud storage 140, so that the storage space of the local storage can be saved, the performance of the system can be improved, and the user experience can be improved.
With continuing reference to fig. 4, fig. 4 is a flow chart of a second embodiment of a data processing method provided herein. The embodiment of the present application is similar to the first embodiment described above, except that:
in the data processing method according to the embodiment of the present application, determining the first type of data in the local data (i.e., step 301) includes:
step 3011, determine the first type of data in the local data according to the metadata, where the metadata records attribute information of the data.
In this step, the server 120 may determine the first type of data in the local data of the first client 110 according to the metadata.
And, in case that the first client 110 and the server 120 are both Linux systems, the metadata may include base metadata and extended metadata, the base metadata being metadata stored in a database in the server 120, and the extended metadata being metadata stored in an extended attribute of the file.
To facilitate understanding of the base metadata and the extended metadata, the following description is made by way of specific embodiments.
1. Extended metadata
The extended metadata may include attribute information as in table 1 below.
It should be understood that, although table 1 below specifically illustrates specific attribute information of the extended metadata, a person skilled in the art may also set the specific attribute information of the extended metadata according to actual requirements, and the application is not limited to this. In addition, the following table 2 is similar and the description will not be repeated.
Figure BDA0002009652120000111
TABLE 1
2. Basic metadata
In order to quickly scan out the matching first-class data from the massive data stored in the local storage, the present application may add a database to the server 120, where the database may have the attribute information as shown in table 2 below.
Figure BDA0002009652120000112
TABLE 2
It should be understood that, although the present embodiment specifically defines that, in the case that the first client 110 and the server 120 are both Linux systems, the metadata may include basic metadata and extended metadata, a person skilled in the art may also set the division of the metadata according to actual needs, and it is sufficient to ensure that both the systems of the first client 110 and the server 120 are applicable, and this is not limited in this application.
It should be understood that, although the present embodiment defines that the metadata includes the basic metadata and the extended metadata, those skilled in the art may also perform related setting on the metadata according to actual needs, for example, the metadata may be stored only in a database in the server 120, which is not limited in this application.
In this embodiment, when the metadata records attribute information of data, the first type of data is quickly found from the local mass data by the metadata.
With continuing reference to FIG. 5, a flow diagram of a third embodiment of a data processing method provided herein is presented. The embodiment of the present application is similar to the second embodiment described above, except that:
in the data processing method according to the embodiment of the application, the determining the first type of data in the local data (step 3011) according to the metadata includes:
step 501, determining a read frequency or a write frequency of the local data according to the attribute information, where the read frequency is a read frequency of the local data within a preset time, and the write frequency is a write frequency of the local data within the preset time.
It should be understood that the preset time may also be referred to as a lease, a time period, etc., and the application is not limited thereto.
It should be understood that the read frequency may indicate the level of read data in addition to the number of reads of local data within a preset time. And, in the case of setting the level of the read data, a threshold value of each level may also be set, for example, the number of times of reading data is less than 1 thousand times is set as level 1, and the number of times of reading data between 1 thousand times and 1 ten thousand times is set as level 2, which is not limited in the present application.
Correspondingly, the write frequency may also represent the level of write data, which is similar to the setting rule of the level of read data, except that the write frequency represents the number of writes to the local data within the preset time, and is not described herein again.
In this step, the second client 130 may query whether the used capacity of the local storage reaches a capacity threshold (e.g., 80%) according to the instructions of the server 120. If the data does not meet the first type of data condition, the next instruction is continuously waited, and if the data meets the first type of data condition, the data meeting the first type of data condition is determined by inquiring the attribute information in the metadata, and the inquiry result containing the local data meeting the first type of data condition can be sent to the server 120.
In addition, the server 120 may determine the read frequency or the write frequency of the local data meeting the first type of data condition according to the query result and the metadata.
It should be understood that the server 120 may also query the first client 110 for data meeting the first type of data condition directly, which is not limited in this application.
In addition, the lease can be set to be several minutes to several tens of minutes, and can be set according to actual needs, and the use of the lease is as follows:
1. the file is accessed. If the file has no lease, creating a new lease, and clearing the reading frequency and the writing frequency of the local data in the database;
2. when a file is read and written, the field values of the read frequency and the write frequency of the local data in the database are increased automatically, namely the read frequency and the write frequency of the local data are updated in real time, and meanwhile, the lease corresponding to the local data is recalculated;
3. and when the lease corresponding to the local data is invalid, clearing the read frequency field and the write frequency field corresponding to the local data in the database, and finally destroying the lease.
Therefore, the method for setting the lease can ensure that the read frequency and the write frequency of the local data are valid, and avoid the interference of invalid read frequency or invalid write frequency in other time periods, for example, the interference of read frequency and write frequency of the local data in yesterday can be avoided.
Step 502, determining a first type of data in the local data according to the read times and the write times.
In this step, after querying the metadata, the server 120 determines whether the read frequency of the local data is less than a preset read frequency, and if so, determines that the local data is cold data. Correspondingly, the server 120 may further determine whether the write frequency of the local data is less than a preset write frequency, and if so, determine that the local data is cold data.
It should be understood that the preset number of reads and the preset number of writes may be the same or different, and the present application is not limited thereto.
It should be understood that the server 120 may also confirm whether the read frequency of the local data is within the preset level of the read data after querying the metadata, and if not, determine that the local data is cold data. Correspondingly, the server 120 may further determine whether the write frequency of the local data is within a preset write data level, and if not, determine that the local data is cold data.
In the embodiment, cold data is quickly determined from local massive data through metadata, and hierarchical management of the cold data and the cold data is also met.
With continuing reference to fig. 6, fig. 6 is a flowchart of a fourth embodiment of a data processing method provided herein. The embodiment of the present application is similar to the first embodiment described above, except that:
in the data processing method according to the embodiment of the present application, the first type of data is accumulated data, the attribute information includes creation time of the data, and determining the first type of data in the local data (step 301) includes:
step 601, according to the attribute information, determining the creation time of the local data.
In this step, the server 120 or the second client 130 may match the local data with a database in the server 120 to determine the creation time of the local data, so that the creation time of the local data can be quickly obtained.
Step 602, determining the local data as accumulated data when the interval time between the creation time and the current time is greater than a preset time period.
In this step, the server 120 may first determine an interval between the creation time of the local data and the current time, then compare the interval with a preset time period, and in case that the interval of the current time is greater than the preset time period, determine the local data as the accumulated data.
It should be understood that the preset time period may be specifically set according to actual requirements, for example, the preset time period is three months, which is not limited in the present application.
It should be understood that the above process of determining the accumulated data may be performed in parallel for a plurality of local data, or may be performed in sequence for a plurality of local data one by one.
In this embodiment, the migration of accumulated data may archive files created earlier, thereby conserving local storage space.
With continuing reference to fig. 7, fig. 7 is a flowchart of a fifth embodiment of a data processing method provided herein. The embodiment of the present application is similar to the first embodiment described above, except that:
in the data processing method according to the embodiment of the present application, the attribute information includes a migration identifier, where the migration identifier is used to indicate whether the local data is migrated to the cloud storage 140, and the data processing method further includes:
step 303, after the migration of the first type of data is completed, setting the migration identifier corresponding to the first type of data to a first value, where the first value indicates that the first type of data has been migrated to the cloud storage 140.
In this step, after the migration of the first type of data is completed, the server 120 may update the migration identifier corresponding to the first type of data, for example, after the migration of the first type of data is completed, the migration identifier of the first type of data may be marked as 1, and in addition, the first value may be specifically set according to an actual requirement, which is not limited in this application.
It should be understood that the updating process of the migration identifier of the first type data may be performed on a plurality of first type data in parallel, or may be performed on a plurality of first type data sequentially according to the order, which is not limited in this application.
In the embodiment, migrated data in the local data is recorded through the migration identifier, so that management of the migrated data is facilitated.
With continuing reference to fig. 8, fig. 8 is a flowchart of a sixth embodiment of a data processing method provided herein. The present embodiment is similar to the fifth embodiment, except that:
in the data processing method according to the embodiment of the present application, the attribute information includes a size of the data and path information of the data in the cloud storage 140, and the data processing method further includes:
and step 304, after the migration of the first type data is completed, deleting the first type data in the local storage, and recording the size and the path information of the first type data in the metadata.
In this step, after the migration of the first type data is completed, the server 120 may truncate the first type data in the local storage, where the truncation refers to deleting the data in the folder and saving the size, path information, extended attribute, and the like of the related data.
It should be understood that the truncation may be performed by truncating the plurality of first type data in parallel, or sequentially by truncating the plurality of first type data in sequence.
In this embodiment, the state in the local data is recorded by updating the migration identification, thereby facilitating management of the local data.
With continuing reference to fig. 9, fig. 9 is a flowchart of a seventh embodiment of the data processing method provided herein. The embodiment of the present application is similar to the first embodiment described above, except that:
in the data processing method according to the embodiment of the present application, the data processing method further includes:
step 901, a data access request of the first type of data is obtained.
In this step, in the case that the user pre-reads the local data, the server 120 may determine whether the pre-accessed local data has been migrated according to the access direct and metadata sent by the first client 110, and if not, directly read the local data, and if so, determine that the pre-accessed local data is the first type of data.
Step 902, downloading the first type data from the cloud storage 140 to the local storage according to the path information of the first type data recorded by the metadata in the cloud storage 140.
In this step, when the local data is determined to be the first type of data, the server 120 may download the first type of data to the local storage according to the storage path recorded by the metadata, so that the downloading of the first type of data is completed without the user's knowledge, and the subsequent user may access the first type of data normally.
In this embodiment, the data migration is transparent to the user, and the completion of the migration is waited only when the migration is triggered, thereby realizing transparent access.
With continuing reference to fig. 10, fig. 10 is a flowchart of an eighth embodiment of the data processing method provided by the present application. The present embodiment is similar to the seventh embodiment, except that:
in the data processing method according to the embodiment of the present application, the attribute information includes a migration flag, where the migration flag is used to indicate whether the first type of data has been migrated back to the local storage, and the data processing method further includes:
step 903, after the first type of data is migrated back, setting a migration flag corresponding to the first type of data as a second value, where the second value is used to indicate that the first type of data has been migrated back to the local memory.
In this step, after the migration of the first type of data is completed, the server 120 may update the migration flag corresponding to the first type of data, for example, after the migrated first type of data is cleaned, the migration flag of the cleaned first type of data may be set to 1. In addition, the second value may also be specifically set according to actual requirements, which is not limited in this application.
It should be understood that the updating process of the migration identifier of the first type data may be performed by migrating a plurality of first type data in parallel, or sequentially migrating a plurality of first type data according to the sequence, which is not limited in this application.
In the embodiment, the migration state of the first type of data is recorded through the back migration identifier, so that the back migration of the first type of data is facilitated.
With continuing reference to fig. 11, fig. 11 is a flowchart of a ninth embodiment of the data processing method provided by the present application. The embodiment of the present application is similar to the eighth embodiment, except that:
in the data processing method according to the embodiment of the present application, the data processing method further includes:
and 904, determining the first type of data which is migrated back according to the attribute information.
In this embodiment, the server 120 may determine the first type of data that has been migrated back to the local storage by a migration back identification of the metadata in the database.
Step 905, deleting the migrated first type data in the local storage, and setting a migration flag corresponding to the migrated first type data as a third value, where the third value is used to indicate that the first type data is not migrated back to the local storage.
It should be understood that the deleting process of the first type data can be set according to actual requirements. For example, after reading the first type data, the server 120 may send a deletion instruction to the first client 110 to delete the migrated first type data. For another example, after reading the first type data, the server 120 may store the first type data in the local storage for a period of time, and then delete the migrated first type data. For another example, after the first type data is migrated back, the migrated first type data becomes hot data, in which case the server 120 may delete the first type data in the cloud storage 140, which is not limited in this application.
In addition, the third value may be specifically set according to actual requirements, for example, after the migrated first type data is cleaned, the migration flag of the cleaned first type data may be reset to 0, which is not limited in this application.
In this embodiment, excessive local storage is avoided by cleaning the migrated data.
It is to be understood that the above data processing method is only exemplary, and those skilled in the art can make various modifications according to the above method.
Referring to fig. 12, fig. 12 shows a data processing apparatus provided in an embodiment of the present application, and it should be understood that the data processing apparatus corresponds to the method embodiment of fig. 1 to 11, and can perform various steps related to the method embodiment, and specific functions of the data processing apparatus may be referred to the description above, and detailed descriptions are appropriately omitted herein to avoid redundancy. The data processing device comprises at least one software functional module that can be stored in a memory in the form of software or firmware (firmware) or solidified in an Operating System (OS) of the data processing device. Specifically, the data processing apparatus includes: a determining module 1201, configured to determine first type data in the local data, where the first type data includes cold data and/or accumulated data, and the accumulated data is data whose storage time exceeds a predetermined time; a migration module 1202 for migrating the first type of data from the local storage to the cloud storage 140.
In addition, the determining module 1201 is further configured to determine a first type of data in the local data according to metadata, where the metadata records attribute information of the data.
In addition, the determining module 1201 is further configured to determine, according to the attribute information, a read frequency or a write frequency of the local data, where the read frequency is a number of times of reading the local data within a preset time, and the write frequency is a number of times of writing the local data within the preset time; the determining module 1201 is further configured to determine that the local data is cold data when the read frequency is less than a preset read frequency; the determining module 1201 is further configured to determine that the local data is cold data when the number of writes is less than a preset number of writes.
In addition, the first type of data is accumulated data, the attribute information includes creation time of the data, and the determining module 1201 is further configured to determine creation time of the local data according to the attribute information; the determining module 1201 is further configured to determine the local data as accumulated data if an interval between the creation time and the current time is greater than a preset time period.
In addition, the attribute information includes a migration identifier, where the migration identifier is used to indicate whether the local data is migrated to the cloud storage 140, and the setting module (not shown) is configured to set, after the migration of the first type of data is completed, the migration identifier corresponding to the first type of data to a first value, where the first value indicates that the first type of data has been migrated to the cloud storage 140.
In addition, the attribute information includes the size of the data and path information of the data in the cloud storage 140, and the setting module is further configured to delete the first type of data in the local storage after the migration of the first type of data is completed, and record the size of the first type of data and the path information in the metadata.
In addition, the obtaining module (not shown) is used for obtaining a data access request of the first type data; the download module (not shown) is configured to download the first type data from the cloud storage 140 to the local storage according to the path information of the first type data recorded by the metadata in the cloud storage 140.
In addition, the attribute information includes a migration flag, where the migration flag is used to indicate whether the first type of data has been migrated back to the local memory, and the setting module is further used to set the migration flag corresponding to the first type of data to a second value after the first type of data has been migrated back, where the second value is used to indicate that the first type of data has been migrated back to the local memory.
In addition, the determining module 1201 is further configured to determine the migrated first type of data according to the attribute information; the setting module is further configured to delete the migrated first type of data in the local storage, and set a migration flag corresponding to the migrated first type of data as a third value, where the third value is used to indicate that the first type of data is not migrated back to the local storage.
Further, the metadata includes base metadata that is metadata stored in the database and extended metadata that is metadata stored in an extended attribute of the file.
Fig. 13 is a block diagram of a structure of an apparatus 1300 in an embodiment of the present application, as shown in fig. 13. The apparatus 1300 may include a processor 1310, a communication interface 1320, a memory 1330, and at least one communication bus 1340. The communication bus 1340 is used to enable direct, connected communication between these components. In this embodiment, the communication interface 1320 of the device is used for communicating signaling or data with other node devices. The processor 1310 may be an integrated circuit chip having signal processing capabilities. The Processor 1310 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor 1310 may be any conventional processor or the like.
The Memory 1330 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 1330 stores computer readable instructions that, when executed by the processor 1310, enable the apparatus 1300 to perform the steps of the method embodiments of fig. 1-11.
The apparatus 1300 may further include a memory controller, an input-output unit, an audio unit, a display unit.
The memory 1330, the memory controller, the processor 1310, the peripheral interface, the input/output unit, the audio unit, and the display unit are electrically connected to each other directly or indirectly to realize data transmission or interaction. For example, these elements may be electrically connected to each other via one or more communication buses 1340. The processor 1310 is used to execute executable modules stored in the memory 1330, such as software functional modules or computer programs included in the data processing apparatus. And the data processing apparatus is configured to perform the following method: determining a first type of data in the local data; migrating the first type of data from the local storage to the cloud storage.
The input and output unit is used for providing input data for a user to realize the interaction of the user and the server (or the local terminal). The input/output unit may be, but is not limited to, a mouse, a keyboard, and the like.
The audio unit provides an audio interface to the user, which may include one or more microphones, one or more speakers, and audio circuitry.
The display unit provides an interactive interface (e.g. a user interface) between the electronic device and a user or for displaying image data to a user reference. In this embodiment, the display unit may be a liquid crystal display or a touch display. In the case of a touch display, the display can be a capacitive touch screen or a resistive touch screen, which supports single-point and multi-point touch operations. The support of single-point and multi-point touch operations means that the touch display can sense touch operations simultaneously generated from one or more positions on the touch display, and the sensed touch operations are sent to the processor for calculation and processing. The display unit may display the composite image obtained by the processor 1310 executing the steps shown in fig. 1 to 11, and may also display the result of determining whether there is a hidden trouble in the line in the region to be inspected.
The input and output unit is used for providing input data for a user to realize the interaction between the user and the processing terminal. The input/output unit may be, but is not limited to, a mouse, a keyboard, and the like.
It will be appreciated that the configuration shown in fig. 13 is merely illustrative, and that the apparatus 1300 may include more or fewer components than shown in fig. 13, or have a different configuration than shown in fig. 13. The components shown in fig. 13 may be implemented in hardware, software, or a combination thereof.
The present application further provides a computer-readable storage medium having a data processing program stored thereon, where the data processing program is executed by a processor to perform the method of the method embodiment.
The present application also provides a computer program product which, when run on a computer, causes the computer to perform the method of the method embodiments.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the system described above may refer to the corresponding process in the foregoing method, and will not be described in too much detail herein.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined or explained in subsequent figures.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (7)

1. A data processing method, comprising:
determining a first type of data in the local data;
migrating the first type of data from a local memory to a cloud storage;
the determining the first type of data in the local data comprises:
determining the first type of data in local data according to metadata, wherein the metadata records attribute information of the data;
the determining the first type of data in the local data according to the metadata includes:
determining the read frequency or the write frequency of the local data according to the attribute information, wherein the read frequency is the read frequency of the local data in a lease period, and the write frequency is the write frequency of the local data in the lease period;
determining the local data as the cold data under the condition that the reading times are less than preset reading times; or the like, or, alternatively,
determining the local data as the cold data under the condition that the writing times are less than preset writing times;
the using method of the lease comprises the following steps:
under the condition of accessing a file, if the file is determined to have no lease term, establishing a new lease term, and clearing the reading frequency and the writing frequency of local data in a database;
under the condition of reading and writing the file, the field values of the reading frequency and the writing frequency of the local data in the database are increased automatically, and the lease corresponding to the local data is recalculated;
and under the condition that the lease corresponding to the local data is invalid, clearing the read frequency field and the write frequency field corresponding to the local data in the database, and finally destroying the lease.
2. The data processing method according to claim 1, wherein the attribute information includes a migration flag indicating whether the local data is migrated to the cloud storage, and the data processing method further includes:
after the migration of the first type of data is completed, setting a migration identifier corresponding to the first type of data to be a first value, where the first value indicates that the first type of data has been migrated to the cloud storage.
3. The data processing method according to claim 1, wherein the attribute information includes a size of data and path information of the data in the cloud storage, the data processing method further comprising:
after the migration of the first type data is completed, deleting the first type data in the local storage, and recording the size of the first type data and the path information in the metadata.
4. The data processing method of claim 3, further comprising:
acquiring a data access request of the first type of data;
and downloading the first type of data from the cloud storage to the local storage according to the path information of the first type of data recorded by the metadata in the cloud storage.
5. The data processing method according to claim 4, wherein the attribute information includes a rollback flag indicating whether the first type of data has been migrated back to the local storage, the data processing method further comprising:
after the first type of data is migrated back, setting a migration identifier corresponding to the first type of data to be a second value, where the second value is used to indicate that the first type of data has been migrated back to the local memory.
6. The data processing method of claim 5, further comprising:
determining the first type of data which is migrated back according to the attribute information;
deleting the first class of data which is migrated back in the local storage, and setting a migration identifier corresponding to the first class of data which is migrated back as a third value, wherein the third value is used for indicating that the first class of data is not migrated back to the local storage.
7. A data processing apparatus, comprising:
the determining module is used for determining first type data in the local data;
the migration module is used for migrating the first type of data from a local storage to a cloud storage;
the determining module is further configured to determine the first type of data in the local data according to metadata, where the metadata records attribute information of the data;
the first type of data is cold data, the attribute information includes a read frequency or a write frequency, and the determining module is further configured to: determining the read frequency or the write frequency of the local data according to the attribute information, wherein the read frequency is the read frequency of the local data in a lease period, and the write frequency is the write frequency of the local data in the lease period; determining the local data as the cold data under the condition that the reading times are less than preset reading times; or, determining the local data as the cold data when the write times are less than a preset write time;
the determining module is further configured to create a new lease if it is determined that the file has no lease under the condition of accessing the file, and clear the read frequency and the write frequency of the local data in the database;
the data processing apparatus further includes:
the self-adding module is used for self-adding the field values of the reading frequency and the writing frequency of the local data in the database under the condition of reading and writing the file, and recalculating the lease corresponding to the local data;
and the zero clearing module is used for clearing the read frequency field and the write frequency field corresponding to the local data in the database under the condition that the lease corresponding to the local data is invalid, and finally destroying the lease.
CN201910246043.0A 2019-03-28 2019-03-28 Data processing method and device Active CN109947373B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910246043.0A CN109947373B (en) 2019-03-28 2019-03-28 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910246043.0A CN109947373B (en) 2019-03-28 2019-03-28 Data processing method and device

Publications (2)

Publication Number Publication Date
CN109947373A CN109947373A (en) 2019-06-28
CN109947373B true CN109947373B (en) 2022-05-13

Family

ID=67012850

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910246043.0A Active CN109947373B (en) 2019-03-28 2019-03-28 Data processing method and device

Country Status (1)

Country Link
CN (1) CN109947373B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112835967B (en) * 2019-11-25 2023-07-21 浙江宇视科技有限公司 Data processing method, device, equipment and medium based on distributed storage system
CN111125062B (en) * 2019-12-20 2023-10-20 中国银行股份有限公司 Historical data migration method and device, and historical data query method and device
CN112540733A (en) * 2020-12-23 2021-03-23 华录光存储研究院(大连)有限公司 Data management method and device, electronic equipment and storage medium
CN112650453B (en) * 2020-12-31 2024-05-14 北京千方科技股份有限公司 Method and system for storing and inquiring traffic data
CN112883026A (en) * 2021-01-28 2021-06-01 青岛海尔科技有限公司 Data processing method and device
CN113407118B (en) * 2021-06-24 2022-11-01 九江职业技术学院 Data storage device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104731794A (en) * 2013-12-19 2015-06-24 北京华易互动科技有限公司 Cold-hot data fragmenting, mining and storing method
CN107728938A (en) * 2017-09-18 2018-02-23 暨南大学 A kind of cold data Placement Strategy based on frequency association under low energy consumption cluster environment
CN107784108A (en) * 2017-10-31 2018-03-09 郑州云海信息技术有限公司 A kind of data storage and management method, device and equipment

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103078906A (en) * 2012-12-26 2013-05-01 爱迪科特(北京)科技有限公司 Document transparent moving method
CN104516471B (en) * 2013-09-27 2017-04-12 国际商业机器公司 Method and device for managing power supply of storage system
CN106202070A (en) * 2015-04-29 2016-12-07 中国电信股份有限公司 File storage processing method and system
US10453076B2 (en) * 2016-06-02 2019-10-22 Facebook, Inc. Cold storage for legal hold data
CN108268211B (en) * 2017-01-03 2021-09-14 ***通信有限公司研究院 Data processing method and device
US10372371B2 (en) * 2017-09-14 2019-08-06 International Business Machines Corporation Dynamic data relocation using cloud based ranks
CN107809535A (en) * 2017-10-30 2018-03-16 努比亚技术有限公司 A kind of information processing method, mobile terminal and computer-readable storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104731794A (en) * 2013-12-19 2015-06-24 北京华易互动科技有限公司 Cold-hot data fragmenting, mining and storing method
CN107728938A (en) * 2017-09-18 2018-02-23 暨南大学 A kind of cold data Placement Strategy based on frequency association under low energy consumption cluster environment
CN107784108A (en) * 2017-10-31 2018-03-09 郑州云海信息技术有限公司 A kind of data storage and management method, device and equipment

Also Published As

Publication number Publication date
CN109947373A (en) 2019-06-28

Similar Documents

Publication Publication Date Title
CN109947373B (en) Data processing method and device
US11941017B2 (en) Event driven extract, transform, load (ETL) processing
CN108053863B (en) Mass medical data storage system and data storage method suitable for large and small files
US8972337B1 (en) Efficient query processing in columnar databases using bloom filters
US9565232B2 (en) Importing content items
US20140164487A1 (en) File saving system and method
US20160371296A1 (en) Filesystem hierarchical capacity quantity and aggregate metrics
JP2021515330A (en) Data retention handling for data object stores
US10817472B2 (en) Storage organization system with associated storage utilization values
JP7374232B2 (en) Content item sharing with context
CN105900093A (en) Keyvalue database data table updating method and data table updating device
US11650967B2 (en) Managing a deduplicated data index
US20240004883A1 (en) Data ingestion with spatial and temporal locality
CN115840731A (en) File processing method, computing device and computer storage medium
US9886446B1 (en) Inverted index for text searching within deduplication backup system
CN112559913B (en) Data processing method, device, computing equipment and readable storage medium
CN112783887A (en) Data processing method and device based on data warehouse
CN105843809B (en) Data processing method and device
Zhang et al. Recovering SQLite data from fragmented flash pages
CN112965957A (en) Data migration method, device, equipment and storage medium
US11258853B2 (en) Storage management system and method
CN113448957A (en) Data query method and device
CN110888847A (en) Recycle bin system and file recycling method
CN113760600A (en) Database backup method, database restoration method and related device
CN111753141B (en) Data management method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant