Background technology
Along with the fast development of information technology, the social informatization degree is more and more higher, and the digital information that the individual has also is explosive trend growth, and under such background, memory device has become requisite instrument in people's daily life.But various numerous and diverse memory devices have also brought a lot of problems simultaneously, for example how to guarantee data's consistency between a plurality of memory devices of personal user, how to guarantee the safe and reliable of the data deposited in all memory devices, how to solve memory device limited space etc.Therefore, how to provide a kind of efficiently, be convenient to manage, the stores service of space dynamic growth become the hot issue of research.
The new breakthrough that the fast development of computer networking technology brings to memory technology, network attached storage (NAS), the technology such as storage area network (SAN) are in the ascendant, brought change to a great extent for traditional memory technology, but these technology are very expensive on the one hand, are not suitable on the other hand being applied in the wide area network scope.The concept of the cloud storage (Cloud Storage) that is proposed by companies such as SNIA (SNIA) and Amazons (Amazon) has proposed revolutionary conception for present Development of storage technology trend to the network storage in future: the data storage service can be accomplished just as the service of present power and water, network is routed to every household, various expenses standards are provided, provide different services to different user.Doing so on the one hand can be for the user provides the very stores service of high-quality, and arbitrfary point access in the dynamic growth service of memory space and the network for example can be provided; Accomplished on the other hand user transparent, all given special cloud storage service provider with all technical problems and go to solve, the problems such as the reliability that the user needn't concern of data and fail safe; Simultaneously can also be for the user provides the more cheap stores service of high-quality, so that the user needn't spend a large amount of expense maintenance upgrade storage systems.
The cloud storage generally is divided into two major parts according to framework, is respectively cloud stores service and cloud storage system.The cloud stores service refers to that large-scale company of several families is deployed in data, services in the whole internet environment as what the service provider provided, can use certain interface that the cloud stores service is carried out corresponding data access, the data that leave in the cloud stores service are called cloud storage data.Cloud storage system refers to install in client the storage system of deployment, generally comprise the kernel module that capable of dynamic loads, be used for catching file-system command, the Executive Module that carries out alternately with network communication module that cloud storage data are provided and be used for the order that captures is processed with cloud stores service interface, Executive Module is undertaken alternately by network communication module and cloud stores service, and different cloud storage system differences are embodied in the Executive Module.The file that is stored in the cloud storage system is the same with file on being stored in hard disk, also formed by a lot of data blocks in logic, but different be that the data block in the cloud storage system also is file, is referred to as the data block file.Cloud storage system be responsible for the data block file be stored in the cloud stores service, obtain data block file in the cloud stores service, according to data and management data block file in the file-system command Update Table block file of catching.
The cloud storage service provider of main flow comprises the S3 of Amazon (Amazon) and the mesh of Microsoft at present.For different cloud storage service providers, some corresponding cloud storage systems have also appearred.Representative cloud storage system comprises Dropbox and SugarSync etc., and they all are to utilize in client to deposit a complete data backup, then after each the modification, calculates residual quantity data back cloud stores service; When cloud storage system access cloud storage data, needs at first are stored in client with the download portion of data integrity, and then operate for the data of depositing to client.There is following obvious shortcoming in such strategy: one, the inefficiency of access cloud storage data: during cloud storage system access cloud stores service, need all cloud storage data of indiscriminate download to client, can operate accordingly after all download is over, so that the user need to spend the downloading process of plenty of time waiting time, if and the user only needs to operate a small documents, but have to all data are all downloaded from the cloud stores service; Two, opaque to the user: the user can not accomplish to access cloud storage data as access local file system data, need to open corresponding cloud storage system at every turn, carries out just can having access to cloud storage data after the data simultaneous operation; Three, too responsive to network state, if network failure appears in cloud storage system to cloud stores service request msg the time, then all cloud storage data are all unavailable;
Summary of the invention
Order of the present invention is for overcoming the weak point of prior art, a kind of data cache method of cloud storage system is proposed, the method is utilized the strategy of data cached block file and the advantage of program locality (locality), reach the purpose that improves response speed, can also be implemented in the cloud storage data in the access local data cache district in the suspension situation simultaneously.
The data cache method of a kind of cloud storage system that the present invention proposes, it is characterized in that, the method uses flash disk (or to use SIM card, the storage mediums such as SSD dish) as the carrier of high in the clouds file system, and flash disk is divided into system area and data field, deposit operation system in the system area, computer starts from flash disk; This data field Further Division is local data cache district and metadata database, deposits the data-block cache file that obtains from the cloud stores service in the local data cache district, and metadata database is used for recording the descriptor of cloud storage data;
Cloud storage system in the method comprises the kernel module that capable of dynamic loads, network communication module, and the Executive Module that can carry out cache management; The method may further comprise the steps:
1) adopts flash disk as the carrier of cloud storage system, start computer from flash disk, the operating system in the flash disk system area is loaded in the calculator memory; Cloud storage system moves along with the startup of operating system as background program, and the kernel module that the capable of dynamic of cloud storage system is loaded joins in the operating system nucleus, so that cloud storage system is local file system to user's the form of expression;
2) the data-block cache file in the Executive Module scanning of home data buffer area of the carried out cache management of cloud storage system, data block numbering in the file that this data-block cache file corresponding to each data-block cache file that obtains formed leaves in the ordered list in the internal memory;
3) network communication module is carried out initialization, set up message queue, carrying out alternately with the cloud stores service on the internet by message queue; Upper level applications is passed through the POSIX file system interface to virtual file system (VFS) Transmit message system command;
4) kernel module of the capable of dynamic of cloud storage system loading sees through virtual file system and catches the operational order that upper level applications creates file, revised file, file reading and deleted file, and these orders are redirected to the Executive Module that can carry out cache management;
The Executive Module that 5) can carry out cache management carries out the operational order that captures concrete establishment file operation, writes data to existing file operation, file reading data manipulation and delete file operation, and the new data block that forms after these operations is cached in the local data cache district with the form of file; Data when writing data to existing file and reading existing file in the local data cache district surpass setting threshold, the Executive Module that triggering can be carried out cache management carries out the buffer memory replacement to the data-block cache file in the local data cache district, and when needed data-block cache file during not in the local data cache district, obtain the data-block cache file to the cloud stores service;
6) network communication module is with step 5) in the new data-block cache file that forms deposit back the cloud stores service, and when needed data-block cache file during not in the local data cache district, obtain from the cloud stores service.
Characteristics of the present invention and beneficial effect are:
1, utilizes method of the present invention, can utilize the local data cache district effectively to accelerate the response speed of cloud storage system, bring better user to experience;
2, utilize method of the present invention, can be implemented in the cloud that access is left in the local data cache district under the suspension state and store data, and present existing cloud storage system is too responsive to network state, can't work fully at the suspension state;
3, compare with existing cloud storage system, the inventive method can realize the operation of the random read-write of file, thereby can accomplish to support the operations such as video playback that drag at random, and just can carry out associative operation after not needing pending file to download to this locality;
4, the inventive method has been avoided the beyond the clouds existing data of transmission the other side between the storage system and cloud stores service, decrease network overhead;
The pressure to the cloud stores service that has brought when 5, the inventive method has been avoided a large amount of read-write operation, if the data-block cache file leaves in the local data cache district, then directly data block cache file is operated the pressure that has brought to the cloud stores service when having avoided in a large number to cloud stores service request msg;
6, compare with existing cloud storage system, the inventive method is particularly useful for having the cloud storage environment that network between Bandwidth-Constrained, cloud storage system and the cloud stores service between cloud storage system local memory device finite capacity, cloud storage system and the cloud stores service is prone to fault characteristic.
Embodiment
The data cache method of a kind of cloud storage system that the present invention proposes reaches by reference to the accompanying drawings embodiment and is described in detail as follows:
The present invention uses flash disk (or using SIM card, the storage mediums such as SSD dish) as the carrier of high in the clouds file system, and flash disk is divided into system area and data field, deposit operation system in the system area, and computer starts from flash disk; This data field Further Division is local data cache district and metadata database, deposits the data-block cache file that obtains from the cloud stores service in the local data cache district, and metadata database is used for recording the descriptor of cloud storage data;
Cloud storage system in the method comprises the kernel module that capable of dynamic loads, network communication module, and the Executive Module that can carry out cache management; The method may further comprise the steps:
1) adopts flash disk as the carrier of cloud storage system, start computer from flash disk, the operating system in the flash disk system area is loaded in the calculator memory; Cloud storage system moves along with the startup of operating system as background program, and the kernel module that the capable of dynamic of cloud storage system is loaded joins in the operating system nucleus, so that cloud storage system is local file system to user's the form of expression;
2) the data-block cache file in the Executive Module scanning of home data buffer area of the carried out cache management of cloud storage system, data block numbering in the file that this data-block cache file corresponding to each data-block cache file that obtains formed leaves in the ordered list in the internal memory;
3) network communication module is carried out initialization, set up message queue, carrying out alternately with the cloud stores service on the internet by message queue; Upper level applications is passed through the POSIX file system interface to virtual file system (VFS) Transmit message system command;
4) kernel module of the capable of dynamic of cloud storage system loading sees through virtual file system and catches the operational order that upper level applications creates file, revised file, file reading and deleted file, and these orders are redirected to the Executive Module that can carry out cache management;
The Executive Module that 5) can carry out cache management carries out the operational order that captures concrete establishment file operation, writes data to existing file operation, file reading data manipulation and delete file operation, and the new data block that forms after these operations is cached in the local data cache district with the form of file; Data when writing data to existing file and reading existing file in the local data cache district surpass setting threshold, the Executive Module that triggering can be carried out cache management carries out the buffer memory replacement to the data-block cache file in the local data cache district, and when needed data-block cache file during not in the local data cache district, obtain the data-block cache file to the cloud stores service;
6) network communication module is with step 5) in the new data-block cache file that forms deposit back the cloud stores service, and when needed data-block cache file during not in the local data cache district, obtain from the cloud stores service.As shown in Figure 1.
Data-block cache file in the described flash disk local data cache district is used for forming can be for the file of application program; File (can for the file of application program) information table, data block information table and document composition table are set in the metadata database.
Described file information table is as shown in table 1, in this table record all leave file metadata information in the cloud stores service in by cloud storage system, comprise the creation-time (Ctime), modification time (Mtime) of file identification, file size, file type, filename, file parent directory sign, access privilege and file and last access time (Vtime);
Table 1: file information table
Described data block information table is as shown in table 2, data block sign, number of references and the data block size stored in the record cloud stores service in this table, and the data block size maximum of present embodiment can be set to 10MB;
Table 2: data block information table
The data block sign |
Number of references |
The data block size |
A |
2 |
10 |
B |
1 |
8 |
C |
3 |
10 |
... |
... |
... |
Described document composition table is as shown in table 3, and the data block information of record composing document comprises file identification in this table, data block sign and data block piece number;
Table 3: document composition table
File identification |
The data block sign |
The data block numbering |
1 |
C |
0 |
1 |
B |
1 |
... |
... |
... |
Existing data-block cache file in the Executive Module scanning of home data buffer area of the carried out cache management of cloud storage system, obtain each data-block cache file correspondence data block numbering hereof, leave in the ordered list in the internal memory, ordered list is used for the corresponding data block cache file of quick search data-oriented block identification whether in the local data cache district, if in the local data cache district, then return corresponding data-block cache file filename, if do not exist, then return not information (generally being made as 0);
Described step 5) the establishment file operation in is included in adds the metadata information that is created file in metadata database file information table and the document composition table, the mode of thereafter database file being transmitted with residual quantity backups to the cloud stores service;
Described step 5) the existing file operation that writes data in specifically may further comprise the steps as shown in Figure 2:
The parameter that writes order that (5-11) kernel module will capture passes to the Executive Module that can carry out cache management, and command parameter comprises file identification, writes character array pointer and length to be written; Can carry out the Executive Module of cache management and judge at first whether the character array length that writes comprises a plurality of data blocks, if so, then with data truncation to be written, the data writing after blocking is guaranteed all in a data block, remaining this process of datacycle writes, until write; Then can carry out the Executive Module of cache management according to the filename inquiry file information table in the command parameter, obtain the file identification of this document, form table by file identification and the document misregistration inquiry file in the command parameter that obtains, acquisition data block sign, by data block sign inquiry ordered list, judge data-block cache file corresponding to this document whether in the local data cache district;
If (5-12) judgement draws the data-block cache file not in the local data cache district, then send to the cloud stores service according to the file identification that obtains in (5-11) and data block sign and obtain request of data, the cloud stores service finds corresponding data-block cache file and passes back according to file identification and data block sign and is saved in the local data cache district; If the data-block cache file in the local data cache district, is then skipped this step;
(5-13) data in the data-block cache file are read in the calculator memory, according to the command parameter that writes that obtains in (5-11), character array to be written is write in this region of memory, this region of memory is calculated cryptographic Hash, by cryptographic Hash data query block information table, if the data consistent in the data in this region of memory and certain data-block cache file, then the number of references with this data-block cache file adds 1, otherwise, data in the region of memory are written in the local data buffering area, deposit into a new data-block cache file, and be dirty (dirty) with this new data-block cache file identification, the notice kernel module successfully writes, and kernel module and then notice upper level applications successfully write;
(5-14) according to the size of data writing, revise file information table, data block information table and document composition table in the metadata database, the mode of thereafter database file being transmitted with residual quantity backups to the cloud stores service;
(5-15) when local data cache district amount of capacity surpasses setting threshold (generally be made as total capacity size 2/3rds), begin that the local data cache district is carried out buffer memory and replace; At first will not be labeled as dirty data-block cache file and transfer back to the cloud stores service, and delete these data-block cache files; If local data cache district amount of capacity then stops the buffer memory replacement process less than setting threshold at this moment; If the size in local data cache district then adopts the LRU replace Algorithm to replace being labeled as dirty data-block cache file still greater than setting threshold at this moment; Replace and specifically to comprise according to the last access time data block cache file is sorted, calling successively cloud stores service interface will be stored in the cloud stores service with current and be labeled as the data-block cache file deletion of the same name of dirty data-block cache file, thereafter will be labeled as dirty data-block cache file and be transmitted back to the cloud stores service, then this data-block cache file be deleted from the local data cache district.In case local data cache district amount of capacity then stops the buffer memory replacement process less than setting threshold;
Described step 5) read the existing file data manipulation as shown in Figure 3 in, specifically may further comprise the steps:
(5-21) kernel module passes to the Executive Module that can carry out cache management with the parameter of the reading order that captures, and command parameter comprises file identification, core buffer pointer and length to be read; The Executive Module that can carry out cache management is at first judged whether a plurality of data blocks of include file of data length to be read, if comprise a plurality of data blocks, then length to be read is blocked in a data block, and remaining length this process that circulates is read in, until run through; Then can carry out the Executive Module of cache management according to the filename inquiry file information table in the command parameter, obtain the file identification of this document, form table by file identification and the document misregistration inquiry file in the command parameter that obtains, acquisition data block sign, by data block sign inquiry ordered list, judge data-block cache file corresponding to this document whether in the local data cache district;
If (5-22) judgement draws the data-block cache file not in the local data cache district, then send to the cloud stores service according to the file identification that obtains and data block sign and obtain request of data, the cloud stores service finds corresponding data-block cache file and passes back according to file identification and data block sign and is saved in the local data cache district; If the data-block cache file is in the local data cache district, then this step is skipped;
(5-23) copy to the local data cache district from the cloud stores service when data-block cache file to be read, cache manager is read into internal memory with the data-block cache file, according to the reading order parameter that obtains in (5-21), the corresponding data of read block cache file, and the data that read are returned to kernel module, kernel module and then data are returned to upper layer application;
(5-24) last access time item corresponding to institute's file reading sign in the revised file information table, the mode of thereafter database file being transmitted with residual quantity backups to the cloud stores service;
Described step 5) the deletion existing file order in specifically may further comprise the steps as shown in Figure 4:
(5-31) kernel module passes to the Executive Module that can carry out cache management with the parameter of the deleted file order that captures, and command parameter comprises file identification; Form table according to the file identification inquiry file, obtain the data block sign of composing document;
(5-32) according to the result queries data block information table that obtains, number of references corresponding to the sign of the data block among the result subtracted 1, when number of references is 0, this data block sign is saved in the delete list;
(5-33) according to the data block in delete list sign, inquire about successively ordered list, judge that data-block cache file corresponding to this data block sign whether in the local data cache district, if in the local data cache district, then delete the data-block cache file;
(5-34) according to the sign of the data block in the delete list, call successively the deleted file interface of cloud stores service, the respective data blocks cache file in the cloud stores service is deleted;
(5-35) metadata information of deletion deleted file in metadata database file information table and document composition table, the mode of thereafter database file being transmitted with residual quantity backups to the cloud stores service;
(5-36) identify according to the data block in the delete list, delete successively the entry in the data block information table, then the respective record item in deleted file information table and the document composition table is deleted successfully by return value notice kernel module, kernel module and then the success of notice upper layer application deleted file.