WO2016122690A1 - File system replication - Google Patents

File system replication Download PDF

Info

Publication number
WO2016122690A1
WO2016122690A1 PCT/US2015/023649 US2015023649W WO2016122690A1 WO 2016122690 A1 WO2016122690 A1 WO 2016122690A1 US 2015023649 W US2015023649 W US 2015023649W WO 2016122690 A1 WO2016122690 A1 WO 2016122690A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
replica
unique
computing system
timestamp
Prior art date
Application number
PCT/US2015/023649
Other languages
French (fr)
Inventor
Jaipal PENDYALA
Jothivelavan SIVASHANMUGAM
Anmary Justine K
Rajkumar Kannan
Ramesh Kannan K
Original Assignee
Hewlett Packard Enterprise Development Lp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Enterprise Development Lp filed Critical Hewlett Packard Enterprise Development Lp
Publication of WO2016122690A1 publication Critical patent/WO2016122690A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/184Distributed file systems implemented as replicated file system
    • G06F16/1844Management specifically adapted to replicated file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/178Techniques for file synchronisation in file systems

Definitions

  • FIG. 1A illustrates a schematic diagram of an example replica computing system for file system replication, in accordance with an example of the present subject matter.
  • Fig. 1 B illustrates various example components of a replica computing system and a source computing system for file system replication, in accordance with an example of the present subject matter.
  • FIG. 2 illustrates an example method for file system replication, in accordance with an example of the present subject matter.
  • FIG. 3 illustrates another example method for file system replication, in accordance with an example of the present subject matter.
  • FIG. 4 illustrates an example network environment implementing a non-transitory computer readable medium for file system replication, according to an example of the present subject matter.
  • the data storage systems generally include a network of source computing systems and replica computing systems as a part of disaster recovery measures.
  • a source computing system may be considered as a computing system that processes a variety of tasks and queries related to storage and management of the data.
  • a source computing system is a computing system that is actively employed in the enterprises for implementing the data management and file storage systems.
  • the replica computing systems are dormant computing systems deployed in the data storage systems.
  • the replica computing system includes a replica of a file system of the source computing system.
  • the replica computing system can be used in place of the source computing system, whenever the source computing system is down or not available.
  • replica computing systems are dormant computing systems in the network which can be used in place of the source computing system, in case the source computing system is down or not available, thereby ensuring seamless availability of stored data.
  • a file system may be understood as an abstraction implemented onto disk storage for organizing and storing data, stored as files.
  • the file system may facilitate in controlling the manner in which the data, such as a file is stored, modified, or retrieved from the disk storage.
  • Such file related actions may be carried out by one or more applications which may be deployed on a computing system onto which the file system is implemented.
  • Occurrence of any event that causes updates to files, such as creation of a new file or modification of an existing file may be captured and saved in an audit journal.
  • file object attributes corresponding to the files that have been updated are also saved.
  • the file object attributes form metadata which in turn capture modifications made to the file.
  • such file object attributes are obtained from the audit journal and saved in a query database.
  • the query database is provided to facilitate a quick access of the file object attributes by a user of the computing system. Further, a unique ID corresponding to each of the files within the file system, may be also saved in the query database. In such a case, the unique ID is mapped to each of its corresponding file object attributes.
  • all updates made to data stored at the source computing system are periodically replicated onto the replica computing system. For instance, all newly created files and modifications made to existing files may be replicated from the source computing system to the replica computing system.
  • snapshots of the file system and, including the query database, of the source computing system are periodically generated and transmitted to the replica computing system. The snapshots may be used for synchronizing the replica computing system with the source computing system.
  • the replica computing system may therefore include the latest file updates whenever the replica computing system is requested to become active in place of the source computing system.
  • the file object attributes corresponding to the files that have been updated are first saved in the audit journal before being moved to the query database.
  • a synchronization error may occur in case the replica computing system becomes active for replacing the source computing system before such subsequent snapshot is generated.
  • the query database in the replica computing system may thus not include the file object attributes that were not captured during the last snapshot.
  • the file system at the replica computing system may incorrectly consider that the file object attributes corresponding to latest update, have been captured in the query database. This occurs as the file system of the source computing system would have indicated that these file object attributes had been captured in audit journal before the current snapshot was generated.
  • a replica computing system is dynamically updated before being actively deployed in a communication network for replacing a source computing system based on a unique identifier (ID) list for each snapshot generated at the source computing system.
  • ID unique identifier
  • the unique ID list along with the snapshot may be shared with the replica computing system.
  • the unique ID list may include a list of unique IDs of files updated at the source computing system, and corresponding timestamps indicating a time instant at which the files were updated.
  • the replica computing system may use the unique ID list to identify the files for which file object attributes are either not present in a replica query database of the replica computing system or have not been updated.
  • the replica computing system may accordingly update the replica query database.
  • Using the unique ID list to identify file object attributes that are either missing or are not updated facilitates in minimizing computational sources and time utilized for synchronizing the replica computing system.
  • a source query database at the source computing system is updated to include file object attributes of files which have been updated during a file operation.
  • the file operation may be one of creation of a new file, modification of an existing file, and deletion of an existing file.
  • a unique ID list is updated to include a unique ID of the file being updated and a timestamp corresponding to the unique ID. The timestamp indicates a time instant at which the file was updated.
  • a snapshot of a source file system of the source computing system is generated and transmitted to the replica computing system.
  • the snapshot may be saved by the replica computing system for creating a replica of the source file system.
  • the snapshot includes files saved in the source file system, the source query database, and the unique ID list.
  • the replica computing system may thus have a replica file system which will be a replica of the source file system.
  • the replica file system may have the replica query database which will be a replica of the source query database.
  • the replica computing system may use the unique ID list to synchronize the replica computing system with a state of the source computing system as it existed at the time of generation of the snapshot. Initially a last file operation timestamp is obtained from the replica query database of the replica computing system. In one example, the last file operation timestamp is a timestamp corresponding to a unique ID of a file whose file object attributes were recorded as a last updated entry in the replica query database before the snapshot was generated. [0017] Subsequently, a unique ID having a timestamp greater than or equal to the last file operation timestamp is identified from the unique ID list. As previously described, the unique ID corresponds to a file that has been updated at the source computing system.
  • the corresponding timestamp indicates a time instant at which the file was updated.
  • the replica file system is queried to determine if the file, and the file object attributes corresponding to the identified unique ID, are present in the replica file system.
  • the file object attributes corresponding to the file and the file object attributes subsequently obtained and used to update the replica query database.
  • the present subject matter thus facilitates in preventing data loss during file system replication.
  • the data loss is prevented by using the unique ID and the corresponding timestamp to identify the files for which the most recent file object attributes are not stored in the replica query database.
  • the unique ID and the corresponding timestamp are obtained from the unique ID list provided by the source computing system.
  • the unique ID list is updated every time a file is updated, irrespective of whether the updates have been captured in the source query database or not.
  • the unique ID list thus includes unique IDs for all the files that were updated before the snapshot was generated for being transmitted to the replica computing system.
  • Using the unique ID list to identify missing or old versions of file object attributes for files updated at the source computing system ensures that the replica computing system is updated and synchronized with the source computing system. Further, using the unique ID list to identify missing or old versions of file object attributes instead of scanning each entry of the replica query database and the replica file system helps in minimizing computational sources and time utilized for synchronizing the replica computing system.
  • Fig. 1A illustrates a schematic diagram of an example replica computing system 102 for file system replication, in accordance with an example of the present subject matter.
  • the replica computing system 102 is hereinafter interchangeably referred to as system 102.
  • the system 102 may be implemented in, for example, desktop computers, multiprocessor systems, personal digital assistants (PDAs), laptops, network computers, cloud servers, mainframe computers, and computing based devices in general.
  • the system 102 may also be hosting a plurality of applications.
  • the system 102 may further be implemented in a networked environment (not shown in the figure).
  • the system 102 may include, for example, processor(s) 104, a synchronizing module 106 communicatively coupled to the processor(s) 104, and a query database update module 108 communicatively coupled to the processor(s) 104.
  • the processor(s) 104 may include microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any other devices that manipulate signals and data based on computer-readable instructions. Further, functions of the various elements shown in the figures, including any functional blocks labeled as "processor(s)", may be provided through the use of dedicated hardware as well as hardware capable of executing computer-readable instructions.
  • the synchronizing module 106 may scan a replica query database (not shown in this figure) of the replica computing system 102 to identify a last file operation timestamp.
  • the last file operation timestamp is a timestamp corresponding to a unique Identifier (ID) of a file whose file object attributes were recorded as a last updated entry in the replica query database.
  • ID unique Identifier
  • the synchronizing module 106 subsequently identifies a recent unique ID from a unique ID list.
  • the unique ID list includes unique IDs of files updated in a source computing system (not shown in this figure) and timestamps corresponding to each unique ID.
  • the corresponding timestamp indicates a time instant at which the file was updated at the source computing system.
  • the recent unique ID is identified such that a timestamp of the recent unique ID is one of greater than and equal to the last file operation timestamp.
  • the query database update module 108 may subsequently update the replica query database to include the recent unique ID and file object attributes having the most recent updates for a file corresponding to the recent unique ID.
  • Fig. 1 B illustrates various example components of the replica computing system 102 and a source computing system 110 for file system replication, in accordance with an example of the present subject matter.
  • Each of the replica computing system 102 and the source computing system 110 includes processor(s), interface(s), memory 112, modules, and data.
  • the replica computing system 102 includes the processor(s) 104-1 , interface(s) 112-1 , memory 114-1 , modules 116-1 , and data 118-1.
  • the source computing system 110 includes the processor(s) 104-2, interface(s) 112-2, memory 114-2, modules 116-2, and data 118-2.
  • the interfaces 112-1 and 112-2 are hereinafter collectively referred to as the interfaces 112.
  • the interfaces 112 may include a variety of commercially available interfaces, for example, interfaces for peripheral device(s), such as data input output devices, referred to as I/O devices, interface cards, storage devices, and network devices.
  • peripheral device(s) such as data input output devices, referred to as I/O devices, interface cards, storage devices, and network devices.
  • the memory 114-1 and 114-2 is hereinafter collectively referred to as the memory 114.
  • the memory 114 may be communicatively coupled to the processor(s) 104 and may include any non-transitory computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
  • the memory 114 may include a storage device and main memory.
  • the replica computing system 102 includes storage device 120-1 and main memory 122-1 .
  • the source computing system 110 includes storage device 120-2 and main memory 122-2.
  • the storage device 120-1 and 120-2 are hereinafter collectively referred to as the storage device 120.
  • the storage device 120 such as hard disks and magnetic tapes may be used for storing content, such as files in the replica computing system 102 and the source computing system 110.
  • the main memory 122-1 and 122-2 are hereinafter referred to as the main memory 122.
  • the main memory 122 such as RAM may be used for temporary storage of content for processing by the replica computing system 102 and the source computing system 110.
  • the modules 116-1 and 116-2 are hereinafter collectively referred to as the modules 116.
  • the modules 116 include routines, programs, objects, components, and data structures, which perform particular tasks or implement particular abstract data types.
  • the modules 116 may also be implemented as, signal processor(s), state machine(s), logic circuitries, and/or any other device or component that manipulate signals based on operational instructions. Further, the modules 116 can be implemented by hardware, by computer-readable instructions executed by a processing unit, or by a combination thereof.
  • the modules 116-1 of the replica computing system 102 may include a communication module 124, the synchronizing module 106, the query database update module 108, and other module(s) 126.
  • the other modules 126 may include programs or coded instructions that supplement applications and functions, for example, programs in an operating system of the replica computing system 102.
  • the data 118-1 may include a replica unique ID list 128, replica file data 130, replica query database 132, and other data 134.
  • the modules 116-2 of the source computing system 110 may include a file system update module 136, an archive journal scanner 138, an interaction module 140, and other module(s) 142.
  • the other modules 142 may include programs or coded instructions that supplement applications and functions, for example, programs in an operating system of the replica computing system 102.
  • the data 118-2 may include a source unique ID list 144, source file data 146, source query database 148, and other data 140.
  • the present subject matter facilitates in minimizing resources and time utilized in synchronizing the replica computing system 102 with the source computing system 110.
  • the subject matter further facilitates in preventing data loss during such synchronization.
  • the replica computing system 102 uses the unique ID list provided by the source computing system 110 for synchronization.
  • the source computing system 110 may update the unique ID list each time a file is updated during a file operation.
  • the file system update module 136 may ascertain the occurrence of the file operation. Examples of the file operation include, but are not limited to, creation of a new file, modification of an existing file, and deletion of an existing file.
  • the file system update module 136 may subsequently update an audit journal of the source file system to include the file operation along with file object attributes corresponding to the files that have been updated.
  • the file object attributes of a file are metadata capturing the modifications made to the file.
  • the source file system and the audit journal of the source file system are stored in the storage device 120-2.
  • the file system update module 136 may obtain the unique ID corresponding to the file that has been updated during the file operation. The file system update module 136 may then update the unique ID list to include the unique ID and the corresponding timestamp indicating a time instant at which the file was updated. Further, in case no unique ID list exists at the time of occurrence of the file operation, the file system update module 136 may create a new unique ID list and save the unique ID and the corresponding timestamp in the unique ID list. For instance, in case a snapshot had been taken just before the occurrence of the file operation, the unique ID list existing at that would have been shared with the replica computing system 102 and the source computing system 1 10 would thus not have any unique ID list. The file system update module 136 in such a case may create a new unique ID list and save the unique ID and the corresponding timestamp in the unique ID list. In one example, the unique ID list is saved in the source unique ID list 144.
  • the audit journal is periodically processed by the archive journal scanner 138 to obtain data related to the file operation and the corresponding file object attributes.
  • the archive journal scanner 138 may save the data related to the file operation and the corresponding file object attributes in the source query database 148.
  • the archive journal scanner 138 may further save the unique ID corresponding to the file in the source query database 148.
  • the file system update module 136 may generate a snapshot of the source file system.
  • the snapshot of the source file system may include source file data 146, i.e., files saved in the source file system, the source query database 148, and the unique ID list.
  • the snapshot is then transmitted to the replica computing system 102 by the interaction module 140.
  • the snapshot is received by the communication module 124 of the replica computing system 102.
  • the communication module 124 may save the snapshot in the replica file data 130.
  • the replica query database 132 and the replica file system are dynamically updated to replicate the source query database 148 and the source file system.
  • the unique ID list is saved in the replica unique ID list 132.
  • the synchronizing module 106 may use the unique ID list to synchronize the replica computing system 102 with the source computing system 1 10 upon receiving a synchronization request.
  • the synchronization request may be either an activation indication indicating failover of the source computing system 110 or a snapshot update indication indicating receipt of the snapshot.
  • the synchronizing module 106 may scan the replica query database 132 to identify a last file operation timestamp.
  • the last file operation timestamp is a timestamp corresponding to a unique ID of a file whose file object attributes were recorded as a last updated entry in the replica query database 132.
  • the last file operation timestamp may thus indicate a time instant at which a last update was made to the source query database 148 before the snapshot was generated at the source computing system 110.
  • the last file operation timestamp may correspond to a unique ID for which a new record was created at the time instant indicated by the last file operation timestamp.
  • the last file operation timestamp may correspond to a unique ID for which an existing record was updated at the time instant indicated by the last file operation timestamp.
  • the synchronizing module 106 may obtain the last file operation timestamp to identify a recent unique ID from the unique ID list.
  • the recent unique ID is a unique ID having a timestamp greater than or equal to the last file operation timestamp.
  • the recent unique ID may thus be defined as unique ID corresponding to a file that was updated after the last update of the source query database 148 and before the generation of the snapshot at the source computing system 110.
  • the recent unique ID and the timestamp corresponding to the file are thus updated in the unique ID list, however, most recent file object attributes corresponding to the recent unique ID are missing from the replica query database 132.
  • the synchronizing module 106 may initially compare the last file operation timestamp with the timestamp corresponding to each unique ID present in the unique ID list to determine the recent unique ID. Each unique ID having a timestamp greater than or equal to the last file operation timestamp is identified as the recent unique ID. Further, the unique IDs having a timestamp less than the last file operation timestamp are determined to be updated. Such unique IDs are determined to be corresponding to files for which the most recent file object attributes are present in the replica query database 132. In case the last file operation timestamp is greater than all timestamps for all unique IDs present in Unique ID list, the synchronizing module 106 may conclude that the replica computing system 102 is synchronized.
  • the query database update module 108 may query the replica file system to determine if the file object attributes of the file corresponding to the recent unique ID are present in the replica file system. If the file corresponding to the unique ID is present in the replica file system, the query database update module 108 may obtain the file object attributes of the file corresponding to the unique ID. The query database update module 108 may further update the replica query database 132 to include the file object attributes having the most recent updates for the unique ID.
  • updating the replica query database 132 may include either updating existing file storage attributes or creating a new entry for the recent unique ID and the file object attributes. For instance, in case the recent unique ID and file object attributes of the file corresponding to the recent unique ID are present in the replica query database 132, the query database update module 108 may update the existing file object attributes for the recent unique ID. In case the recent unique ID and file object attributes of the file corresponding to the recent unique ID are not present in the replica query database 132, the query database update module 108 may add the file object attributes for the recent unique ID in the replica query database.
  • the query database update module 108 may determine that the file has been deleted in the source computing system 1 10.
  • the recent unique ID and the corresponding timestamp in the unique ID list may thus correspond to the file operation related to deletion of the file from the source file system 1 10.
  • the query database update module 108 may thus delete the recent unique ID and the last updated timestamp from the replica query database 132.
  • Figs. 2 and 3 illustrate example methods 200 and 300, respectively, for file system replication, in accordance with an example of the present subject matter.
  • the order in which the methods are described is not intended to be construed as a limitation, and any number of the described method blocks may be combined in any order to implement the aforementioned methods, or an alternative method.
  • the methods 200 and 300 may be implemented by processing resource or computing device(s) through any suitable hardware, non-transitory machine readable instructions, or combination thereof.
  • the methods 200 and 300 may be performed by programmed computing devices, such as the source computing system 1 10 and the replica computing system 102, respectively as depicted in Fig. 1 B. Furthermore, the methods 200 and 300 may be executed based on instructions stored in a non-transitory computer readable medium, as will be readily understood.
  • the non-transitory computer readable medium may include, for example, digital memories, magnetic storage media, such as one or more magnetic disks and magnetic tapes, hard drives, or optically readable digital data storage media.
  • a source query database is updated to include file object attributes of a file updated during a file operation.
  • the source query database is associated with a source computing system, such as the source computing system 1 10.
  • Examples of the file operation include, but are not limited to, creation of a new file, modification of an existing file, and deletion of an existing file.
  • a unique Identifier (ID) list is updated to include a unique ID of the file being updated and a timestamp corresponding to the unique ID.
  • the timestamp indicates a time instant at which the file was updated.
  • the unique ID list is updated at the source computing system 1 10 by a file system update module, for example, the file system update module 136 of the source computing system 1 10.
  • a snapshot of a source file system is transmitted to a replica computing system for creating a replica of the source query database.
  • the snapshot includes files saved in the source file system, the source query database, and the unique ID list. Further the snapshot is transmitted by the source computing system 110 to the replica computing system, for example, the replica computing system 102.
  • a snapshot is obtained by a replica computing system, for example, the replica computing system 102.
  • the snapshot is received from a source computing system, for example, the source computing system 110.
  • the snapshot may include files saved in a source file system, a source query database, and a unique Identifier (ID) list.
  • the unique ID list includes unique IDs of files updated in the source computing system and timestamps corresponding to each unique ID.
  • a replica query database associated with the replica computing system is scanned to identify a last file operation timestamp.
  • the last file operation timestamp is a timestamp corresponding to a unique ID of a file whose file object attributes were recorded as a last updated entry in the replica query database.
  • the last file operation timestamp is obtained upon receiving a synchronization request.
  • a recent unique ID is identified from a unique ID list.
  • a synchronizing module of the replica computing system identifies the recent unique ID such that a timestamp of the recent unique ID is one of greater than and equal to the last file operation timestamp.
  • the replica query database is updated to include the recent unique ID and file object attributes having the most recent updates for a file corresponding to the recent unique ID.
  • an existing file object attributes for the recent unique ID is updated.
  • the file object attributes for the recent unique ID are added in the replica query database.
  • Fig. 4 illustrates an example network environment implementing a non-transitory computer readable medium for file system replication, according to an example of the present disclosure.
  • the system environment 400 may comprise at least a portion of a public networking environment or a private networking environment, or a combination thereof.
  • the system environment 400 includes a processing resource 402 communicatively coupled to a computer readable medium 404 through a communication link 406.
  • the processing resource 402 can include one or more processors of a computing device for file system replication.
  • the computer readable medium 404 can be, for example, an internal memory device of the computing device or an external memory device.
  • the communication link 406 may be a direct communication link, such as any memory read/write interface.
  • the communication link 406 may be an indirect communication link, such as a network interface.
  • the processing resource 402 can access the computer readable medium 404 through a network 408.
  • the network 408 may be a single network or a combination of multiple networks and may use a variety of different communication protocols.
  • the processing resource 402 and the computer readable medium 404 may also be coupled to requested data sources 410 through the communication link 406, and/or to communication devices 412 over the network 408.
  • the coupling with the requested data sources 410 enables in receiving the requested data in an offline environment
  • the coupling with the communication devices 412 enables in receiving the requested data in an online environment.
  • the computer readable medium 404 includes a set of computer readable instructions, implementing a synchronizing module 414 and a query database update module 416.
  • the set of computer readable instructions can be accessed by the processing resource 402 through the communication link 406 and subsequently executed to process requested data communicated with the requested data sources 410 in order to facilitate file system journaling.
  • the instructions of the synchronizing module 414 may perform the functionalities described above in relation to the synchronizing module 106.
  • the instructions of the query database update module 416 may perform the functionalities described above in relation to the query database update module 108.
  • the synchronizing module 414 may scan a replica query database associated with a replica computing system to identify a last file operation timestamp.
  • the last file operation timestamp is a timestamp corresponding to a unique Identifier (ID) of a file whose file object attributes were recorded as a last updated entry in the replica query database.
  • ID unique Identifier
  • the synchronizing module 414 may subsequently identify a recent unique ID from a unique ID list based on the last file operation timestamp and a timestamp of the recent unique ID.
  • the timestamp of the recent unique ID is one of greater than and equal to the last file operation timestamp.
  • the unique ID list includes unique IDs and timestamps of files updated in a source computing system. The unique ID is associated with a file that has been updated at a source computing system, while the timestamp indicates a time instant at which the file was updated.
  • the query database update module 416 may query a replica file system of the replica computing system to determine if the file and the file object attributes corresponding to the recent unique ID are present in the replica file system.
  • the query database update module 416 may obtain the file object attributes corresponding to the file.
  • the query database update module 416 may subsequently update the replica query database to include the file object attributes having the most recent updates for the recent unique ID.
  • the query database update module 416 may determine the file to be deleted.
  • the query database update module 416 may delete the recent unique ID, the file object attributes corresponding to the file, as saved in the replica query database, and the last updated timestamp, from the replica query database.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Replica computing system comprising a processor and a synchronizing module coupled to the processor to scan a replica query database associated with the replica computing system to identify a last file operation timestamp corresponding to a unique ID of a file whose file object attributes were recorded as a last updated entry in the replica query database. The synchronizing module identifies a recent unique ID from a unique ID list. A timestamp of the recent unique ID is one of greater than and equal to the last file operation timestamp. The unique ID list includes unique IDs of files updated in a source computing system and timestamps corresponding to each unique ID. A query database update module coupled to the processor updates the replica query database to include the recent unique ID and file object attributes having the most recent updates of file for the recent unique ID.

Description

FILE SYSTEM REPLICATION
BACKGROUND
[0001] Increasing use of data processing and data generation in enterprises produces ever-increasing amounts of data which may be stored for short, medium, or long periods. Such data may be used by the organization for carrying out one or more operations. In order to ensure uninterrupted functioning, and to prevent any data loss owing to malfunctioning of computing systems managing such data, the enterprises deploy a network of computing systems. BRIEF DESCRIPTION OF FIGURES
[0002] Fig. 1A illustrates a schematic diagram of an example replica computing system for file system replication, in accordance with an example of the present subject matter.
[0003] Fig. 1 B illustrates various example components of a replica computing system and a source computing system for file system replication, in accordance with an example of the present subject matter.
[0004] Fig. 2 illustrates an example method for file system replication, in accordance with an example of the present subject matter.
[0005] Fig. 3 illustrates another example method for file system replication, in accordance with an example of the present subject matter.
[0006] Fig. 4 illustrates an example network environment implementing a non-transitory computer readable medium for file system replication, according to an example of the present subject matter. DETAILED DESCRIPTION
[0007] Enterprises are nowadays deploying large data storage systems for storage and management of data generated and processed within the enterprise. The data storage systems generally include a network of source computing systems and replica computing systems as a part of disaster recovery measures. A source computing system may be considered as a computing system that processes a variety of tasks and queries related to storage and management of the data. In some examples, a source computing system is a computing system that is actively employed in the enterprises for implementing the data management and file storage systems. The replica computing systems are dormant computing systems deployed in the data storage systems. The replica computing system includes a replica of a file system of the source computing system. The replica computing system can be used in place of the source computing system, whenever the source computing system is down or not available. In one example, replica computing systems are dormant computing systems in the network which can be used in place of the source computing system, in case the source computing system is down or not available, thereby ensuring seamless availability of stored data.
[0008] A file system may be understood as an abstraction implemented onto disk storage for organizing and storing data, stored as files. The file system may facilitate in controlling the manner in which the data, such as a file is stored, modified, or retrieved from the disk storage. Such file related actions may be carried out by one or more applications which may be deployed on a computing system onto which the file system is implemented. Occurrence of any event that causes updates to files, such as creation of a new file or modification of an existing file may be captured and saved in an audit journal. Along with data related to occurrence of the event, file object attributes corresponding to the files that have been updated are also saved. The file object attributes form metadata which in turn capture modifications made to the file. Generally, such file object attributes are obtained from the audit journal and saved in a query database. The query database is provided to facilitate a quick access of the file object attributes by a user of the computing system. Further, a unique ID corresponding to each of the files within the file system, may be also saved in the query database. In such a case, the unique ID is mapped to each of its corresponding file object attributes.
[0009] Further, to keep data at the replica computing system up-to-date with the source computing system, all updates made to data stored at the source computing system are periodically replicated onto the replica computing system. For instance, all newly created files and modifications made to existing files may be replicated from the source computing system to the replica computing system. Generally, snapshots of the file system and, including the query database, of the source computing system are periodically generated and transmitted to the replica computing system. The snapshots may be used for synchronizing the replica computing system with the source computing system. The replica computing system may therefore include the latest file updates whenever the replica computing system is requested to become active in place of the source computing system.
[0010] As explained, the file object attributes corresponding to the files that have been updated are first saved in the audit journal before being moved to the query database. In such a case, it may happen that some of the file object attributes may not get captured in the query database in the snapshot currently being taken. Although, such file object attributes may get captured in a subsequent snapshot, a synchronization error may occur in case the replica computing system becomes active for replacing the source computing system before such subsequent snapshot is generated. The query database in the replica computing system may thus not include the file object attributes that were not captured during the last snapshot. As a result, the file system at the replica computing system may incorrectly consider that the file object attributes corresponding to latest update, have been captured in the query database. This occurs as the file system of the source computing system would have indicated that these file object attributes had been captured in audit journal before the current snapshot was generated.
[0011] Omission of such file object attributes may result in data loss, making the replica computing system out of sync with the source computing system. The file system at the replica computing system would thus have to be entirely scanned to identify all such file object attributes that are indicated as being captured in the query database in the source computing system but have actually not been captured in the query database in the replica computing system. Scanning the entire file system would result in an increase in computational sources and time utilized before the replica computing system is actively deployed.
[0012] Approaches for file system replication are described. The present subject matter facilitates in preventing data loss during file system replication. In one example, a replica computing system is dynamically updated before being actively deployed in a communication network for replacing a source computing system based on a unique identifier (ID) list for each snapshot generated at the source computing system. The unique ID list along with the snapshot may be shared with the replica computing system.
[0013] The unique ID list (as described above) may include a list of unique IDs of files updated at the source computing system, and corresponding timestamps indicating a time instant at which the files were updated. The replica computing system may use the unique ID list to identify the files for which file object attributes are either not present in a replica query database of the replica computing system or have not been updated. The replica computing system may accordingly update the replica query database. Using the unique ID list to identify file object attributes that are either missing or are not updated facilitates in minimizing computational sources and time utilized for synchronizing the replica computing system.
[0014] In accordance to an example of the present subject matter, a source query database at the source computing system is updated to include file object attributes of files which have been updated during a file operation. In one example, the file operation may be one of creation of a new file, modification of an existing file, and deletion of an existing file. Further, a unique ID list is updated to include a unique ID of the file being updated and a timestamp corresponding to the unique ID. The timestamp indicates a time instant at which the file was updated.
[0015] Subsequently a snapshot of a source file system of the source computing system is generated and transmitted to the replica computing system. The snapshot may be saved by the replica computing system for creating a replica of the source file system. In one example, the snapshot includes files saved in the source file system, the source query database, and the unique ID list. The replica computing system may thus have a replica file system which will be a replica of the source file system. Further, the replica file system may have the replica query database which will be a replica of the source query database.
[0016] The replica computing system may use the unique ID list to synchronize the replica computing system with a state of the source computing system as it existed at the time of generation of the snapshot. Initially a last file operation timestamp is obtained from the replica query database of the replica computing system. In one example, the last file operation timestamp is a timestamp corresponding to a unique ID of a file whose file object attributes were recorded as a last updated entry in the replica query database before the snapshot was generated. [0017] Subsequently, a unique ID having a timestamp greater than or equal to the last file operation timestamp is identified from the unique ID list. As previously described, the unique ID corresponds to a file that has been updated at the source computing system. The corresponding timestamp indicates a time instant at which the file was updated. Upon identifying the unique ID, the replica file system is queried to determine if the file, and the file object attributes corresponding to the identified unique ID, are present in the replica file system. The file object attributes corresponding to the file and the file object attributes subsequently obtained and used to update the replica query database.
[0018] Further, if the file and the file object attributes not present in the replica file system, it is determined that the file was deleted at the source computing system. The unique ID, the file object attributes corresponding to the file, and a timestamp corresponding to the unique ID are thus deleted from the replica query database.
[0019] The present subject matter thus facilitates in preventing data loss during file system replication. As will be understood based on the above description, the data loss is prevented by using the unique ID and the corresponding timestamp to identify the files for which the most recent file object attributes are not stored in the replica query database. The unique ID and the corresponding timestamp are obtained from the unique ID list provided by the source computing system. The unique ID list is updated every time a file is updated, irrespective of whether the updates have been captured in the source query database or not. The unique ID list thus includes unique IDs for all the files that were updated before the snapshot was generated for being transmitted to the replica computing system. Using the unique ID list to identify missing or old versions of file object attributes for files updated at the source computing system ensures that the replica computing system is updated and synchronized with the source computing system. Further, using the unique ID list to identify missing or old versions of file object attributes instead of scanning each entry of the replica query database and the replica file system helps in minimizing computational sources and time utilized for synchronizing the replica computing system.
[0020] The systems and methods are further described in conjunction with Fig. 1 to Fig. 4, as examples of the present subject matter. It should be understood that the description and figures merely illustrate the principles of the present subject matter. It will thus be appreciated that various arrangements that embody the principles of the present subject matter, although not explicitly described or shown herein, can be devised from the description and are included within its scope. Furthermore, all examples recited herein are for pedagogical purposes to aid the reader in understanding the principles of the present subject matter. Moreover, all statements herein reciting principles, aspects, and examples of the present subject matter, as well as specific examples thereof, are intended to encompass equivalents thereof.
[0021] Fig. 1A illustrates a schematic diagram of an example replica computing system 102 for file system replication, in accordance with an example of the present subject matter. The replica computing system 102 is hereinafter interchangeably referred to as system 102. The system 102 may be implemented in, for example, desktop computers, multiprocessor systems, personal digital assistants (PDAs), laptops, network computers, cloud servers, mainframe computers, and computing based devices in general. The system 102 may also be hosting a plurality of applications. The system 102 may further be implemented in a networked environment (not shown in the figure).
[0022] The system 102 may include, for example, processor(s) 104, a synchronizing module 106 communicatively coupled to the processor(s) 104, and a query database update module 108 communicatively coupled to the processor(s) 104.
[0023] The processor(s) 104 may include microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any other devices that manipulate signals and data based on computer-readable instructions. Further, functions of the various elements shown in the figures, including any functional blocks labeled as "processor(s)", may be provided through the use of dedicated hardware as well as hardware capable of executing computer-readable instructions.
[0024] In operation, the synchronizing module 106 may scan a replica query database (not shown in this figure) of the replica computing system 102 to identify a last file operation timestamp. In one example, the last file operation timestamp is a timestamp corresponding to a unique Identifier (ID) of a file whose file object attributes were recorded as a last updated entry in the replica query database.
[0025] The synchronizing module 106 subsequently identifies a recent unique ID from a unique ID list. In one example, the unique ID list includes unique IDs of files updated in a source computing system (not shown in this figure) and timestamps corresponding to each unique ID. The corresponding timestamp indicates a time instant at which the file was updated at the source computing system. Further, the recent unique ID is identified such that a timestamp of the recent unique ID is one of greater than and equal to the last file operation timestamp.
[0026] The query database update module 108 may subsequently update the replica query database to include the recent unique ID and file object attributes having the most recent updates for a file corresponding to the recent unique ID. [0027] Fig. 1 B illustrates various example components of the replica computing system 102 and a source computing system 110 for file system replication, in accordance with an example of the present subject matter. Each of the replica computing system 102 and the source computing system 110 includes processor(s), interface(s), memory 112, modules, and data. For instance, the replica computing system 102 includes the processor(s) 104-1 , interface(s) 112-1 , memory 114-1 , modules 116-1 , and data 118-1. The source computing system 110 includes the processor(s) 104-2, interface(s) 112-2, memory 114-2, modules 116-2, and data 118-2.
[0028] The interfaces 112-1 and 112-2 are hereinafter collectively referred to as the interfaces 112. The interfaces 112 may include a variety of commercially available interfaces, for example, interfaces for peripheral device(s), such as data input output devices, referred to as I/O devices, interface cards, storage devices, and network devices.
[0029] The memory 114-1 and 114-2 is hereinafter collectively referred to as the memory 114. The memory 114 may be communicatively coupled to the processor(s) 104 and may include any non-transitory computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In one example, the memory 114 may include a storage device and main memory. For instance, the replica computing system 102 includes storage device 120-1 and main memory 122-1 . The source computing system 110 includes storage device 120-2 and main memory 122-2. [0030] The storage device 120-1 and 120-2 are hereinafter collectively referred to as the storage device 120. The storage device 120, such as hard disks and magnetic tapes may be used for storing content, such as files in the replica computing system 102 and the source computing system 110. The main memory 122-1 and 122-2 are hereinafter referred to as the main memory 122. The main memory 122, such as RAM may be used for temporary storage of content for processing by the replica computing system 102 and the source computing system 110. [0031] The modules 116-1 and 116-2 are hereinafter collectively referred to as the modules 116. The modules 116, amongst other things, include routines, programs, objects, components, and data structures, which perform particular tasks or implement particular abstract data types. The modules 116 may also be implemented as, signal processor(s), state machine(s), logic circuitries, and/or any other device or component that manipulate signals based on operational instructions. Further, the modules 116 can be implemented by hardware, by computer-readable instructions executed by a processing unit, or by a combination thereof.
[0032] The modules 116-1 of the replica computing system 102 may include a communication module 124, the synchronizing module 106, the query database update module 108, and other module(s) 126. The other modules 126 may include programs or coded instructions that supplement applications and functions, for example, programs in an operating system of the replica computing system 102. Further, the data 118-1 may include a replica unique ID list 128, replica file data 130, replica query database 132, and other data 134.
[0033] The modules 116-2 of the source computing system 110 may include a file system update module 136, an archive journal scanner 138, an interaction module 140, and other module(s) 142. The other modules 142 may include programs or coded instructions that supplement applications and functions, for example, programs in an operating system of the replica computing system 102. Further, the data 118-2 may include a source unique ID list 144, source file data 146, source query database 148, and other data 140.
[0034] As previously described, the present subject matter facilitates in minimizing resources and time utilized in synchronizing the replica computing system 102 with the source computing system 110. The subject matter further facilitates in preventing data loss during such synchronization. In one example, the replica computing system 102 uses the unique ID list provided by the source computing system 110 for synchronization. The source computing system 110, in one example, may update the unique ID list each time a file is updated during a file operation. [0035] In one example, the file system update module 136 may ascertain the occurrence of the file operation. Examples of the file operation include, but are not limited to, creation of a new file, modification of an existing file, and deletion of an existing file. The file system update module 136 may subsequently update an audit journal of the source file system to include the file operation along with file object attributes corresponding to the files that have been updated. As previously described, the file object attributes of a file are metadata capturing the modifications made to the file. In one example, the source file system and the audit journal of the source file system are stored in the storage device 120-2.
[0036] Further, the file system update module 136 may obtain the unique ID corresponding to the file that has been updated during the file operation. The file system update module 136 may then update the unique ID list to include the unique ID and the corresponding timestamp indicating a time instant at which the file was updated. Further, in case no unique ID list exists at the time of occurrence of the file operation, the file system update module 136 may create a new unique ID list and save the unique ID and the corresponding timestamp in the unique ID list. For instance, in case a snapshot had been taken just before the occurrence of the file operation, the unique ID list existing at that would have been shared with the replica computing system 102 and the source computing system 1 10 would thus not have any unique ID list. The file system update module 136 in such a case may create a new unique ID list and save the unique ID and the corresponding timestamp in the unique ID list. In one example, the unique ID list is saved in the source unique ID list 144.
[0037] Further, the audit journal is periodically processed by the archive journal scanner 138 to obtain data related to the file operation and the corresponding file object attributes. The archive journal scanner 138 may save the data related to the file operation and the corresponding file object attributes in the source query database 148. The archive journal scanner 138 may further save the unique ID corresponding to the file in the source query database 148.
[0038] Subsequently, the file system update module 136 may generate a snapshot of the source file system. In one example, the snapshot of the source file system may include source file data 146, i.e., files saved in the source file system, the source query database 148, and the unique ID list. The snapshot is then transmitted to the replica computing system 102 by the interaction module 140. In one example, the snapshot is received by the communication module 124 of the replica computing system 102. The communication module 124 may save the snapshot in the replica file data 130. As the snapshot is saved in the replica computing system 102, the replica query database 132 and the replica file system are dynamically updated to replicate the source query database 148 and the source file system. Further the unique ID list is saved in the replica unique ID list 132. [0039] In one example, the synchronizing module 106 may use the unique ID list to synchronize the replica computing system 102 with the source computing system 1 10 upon receiving a synchronization request. The synchronization request may be either an activation indication indicating failover of the source computing system 110 or a snapshot update indication indicating receipt of the snapshot. On receiving the synchronization request, the synchronizing module 106 may scan the replica query database 132 to identify a last file operation timestamp.
[0040] The last file operation timestamp is a timestamp corresponding to a unique ID of a file whose file object attributes were recorded as a last updated entry in the replica query database 132. The last file operation timestamp may thus indicate a time instant at which a last update was made to the source query database 148 before the snapshot was generated at the source computing system 110. In one example, the last file operation timestamp may correspond to a unique ID for which a new record was created at the time instant indicated by the last file operation timestamp. In another example, the last file operation timestamp may correspond to a unique ID for which an existing record was updated at the time instant indicated by the last file operation timestamp.
[0041] The synchronizing module 106 may obtain the last file operation timestamp to identify a recent unique ID from the unique ID list. In one example, the recent unique ID is a unique ID having a timestamp greater than or equal to the last file operation timestamp. The recent unique ID may thus be defined as unique ID corresponding to a file that was updated after the last update of the source query database 148 and before the generation of the snapshot at the source computing system 110. The recent unique ID and the timestamp corresponding to the file are thus updated in the unique ID list, however, most recent file object attributes corresponding to the recent unique ID are missing from the replica query database 132.
[0042] The synchronizing module 106 may initially compare the last file operation timestamp with the timestamp corresponding to each unique ID present in the unique ID list to determine the recent unique ID. Each unique ID having a timestamp greater than or equal to the last file operation timestamp is identified as the recent unique ID. Further, the unique IDs having a timestamp less than the last file operation timestamp are determined to be updated. Such unique IDs are determined to be corresponding to files for which the most recent file object attributes are present in the replica query database 132. In case the last file operation timestamp is greater than all timestamps for all unique IDs present in Unique ID list, the synchronizing module 106 may conclude that the replica computing system 102 is synchronized.
[0043] Upon identifying the recent unique ID, the query database update module 108 may query the replica file system to determine if the file object attributes of the file corresponding to the recent unique ID are present in the replica file system. If the file corresponding to the unique ID is present in the replica file system, the query database update module 108 may obtain the file object attributes of the file corresponding to the unique ID. The query database update module 108 may further update the replica query database 132 to include the file object attributes having the most recent updates for the unique ID.
[0044] In one example, updating the replica query database 132 may include either updating existing file storage attributes or creating a new entry for the recent unique ID and the file object attributes. For instance, in case the recent unique ID and file object attributes of the file corresponding to the recent unique ID are present in the replica query database 132, the query database update module 108 may update the existing file object attributes for the recent unique ID. In case the recent unique ID and file object attributes of the file corresponding to the recent unique ID are not present in the replica query database 132, the query database update module 108 may add the file object attributes for the recent unique ID in the replica query database. [0045] Further, if the file and file object attributes corresponding to the recent unique ID are not present in the replica file system, the query database update module 108 may determine that the file has been deleted in the source computing system 1 10. The recent unique ID and the corresponding timestamp in the unique ID list may thus correspond to the file operation related to deletion of the file from the source file system 1 10. The query database update module 108 may thus delete the recent unique ID and the last updated timestamp from the replica query database 132.
[0046] Figs. 2 and 3 illustrate example methods 200 and 300, respectively, for file system replication, in accordance with an example of the present subject matter. The order in which the methods are described is not intended to be construed as a limitation, and any number of the described method blocks may be combined in any order to implement the aforementioned methods, or an alternative method. Furthermore, the methods 200 and 300 may be implemented by processing resource or computing device(s) through any suitable hardware, non-transitory machine readable instructions, or combination thereof.
[0047] It may also be understood that the methods 200 and 300 may be performed by programmed computing devices, such as the source computing system 1 10 and the replica computing system 102, respectively as depicted in Fig. 1 B. Furthermore, the methods 200 and 300 may be executed based on instructions stored in a non-transitory computer readable medium, as will be readily understood. The non-transitory computer readable medium may include, for example, digital memories, magnetic storage media, such as one or more magnetic disks and magnetic tapes, hard drives, or optically readable digital data storage media.
[0048] Further, the methods 200 and 300 are described below with reference to the source computing system 1 10 and the replica computing system 02 as described above, other suitable systems for the execution of these methods can be utilized. Additionally, implementation of these methods is not limited to such examples.
[0049] Referring to Fig. 2, at block 202, a source query database is updated to include file object attributes of a file updated during a file operation. In one example, the source query database is associated with a source computing system, such as the source computing system 1 10. Examples of the file operation include, but are not limited to, creation of a new file, modification of an existing file, and deletion of an existing file.
[0050] At block 204, a unique Identifier (ID) list is updated to include a unique ID of the file being updated and a timestamp corresponding to the unique ID. The timestamp indicates a time instant at which the file was updated. In one example, the unique ID list is updated at the source computing system 1 10 by a file system update module, for example, the file system update module 136 of the source computing system 1 10.
[0051] At block 206, a snapshot of a source file system is transmitted to a replica computing system for creating a replica of the source query database. In one example, the snapshot includes files saved in the source file system, the source query database, and the unique ID list. Further the snapshot is transmitted by the source computing system 110 to the replica computing system, for example, the replica computing system 102.
[0052] Referring to Fig. 3, at block 302, a snapshot is obtained by a replica computing system, for example, the replica computing system 102. In one example, the snapshot is received from a source computing system, for example, the source computing system 110. The snapshot may include files saved in a source file system, a source query database, and a unique Identifier (ID) list. The unique ID list includes unique IDs of files updated in the source computing system and timestamps corresponding to each unique ID. [0053] At block 304, a replica query database associated with the replica computing system is scanned to identify a last file operation timestamp. The last file operation timestamp is a timestamp corresponding to a unique ID of a file whose file object attributes were recorded as a last updated entry in the replica query database. In one example, the last file operation timestamp is obtained upon receiving a synchronization request.
[0054] At block 306, a recent unique ID is identified from a unique ID list. In one example, a synchronizing module of the replica computing system identifies the recent unique ID such that a timestamp of the recent unique ID is one of greater than and equal to the last file operation timestamp.
[0055] At block 308, it is determined whether a file and file object attributes corresponding to the recent unique ID are present in the replica file system. If in case it is determined that the file and the file object attributes corresponding to the recent unique ID are not present in the replica file system, ('No' path from block 308), the recent unique ID, the file object attributes, and a timestamp corresponding to the unique ID are deleted from the replica query database at block 310.
[0056] If in case it is determined that the file and the file object attributes corresponding to the file are present in the replica file system ('Yes' path from block 308), the file object attributes having most recent updates of the file corresponding to the recent unique ID are obtained from the replica file system at block 312.
[0057] At block 314, the replica query database is updated to include the recent unique ID and file object attributes having the most recent updates for a file corresponding to the recent unique ID. In case the recent unique ID and file object attributes of the file corresponding to the recent unique ID are present in the replica query database, an existing file object attributes for the recent unique ID is updated. In case the recent unique ID and file object attributes of the file corresponding to the recent unique ID are not present in the replica query database, the file object attributes for the recent unique ID are added in the replica query database.
[0058] Fig. 4 illustrates an example network environment implementing a non-transitory computer readable medium for file system replication, according to an example of the present disclosure. The system environment 400 may comprise at least a portion of a public networking environment or a private networking environment, or a combination thereof. In one implementation, the system environment 400 includes a processing resource 402 communicatively coupled to a computer readable medium 404 through a communication link 406.
[0059] For example, the processing resource 402 can include one or more processors of a computing device for file system replication. The computer readable medium 404 can be, for example, an internal memory device of the computing device or an external memory device. In one implementation, the communication link 406 may be a direct communication link, such as any memory read/write interface. In another implementation, the communication link 406 may be an indirect communication link, such as a network interface. In such a case, the processing resource 402 can access the computer readable medium 404 through a network 408. The network 408 may be a single network or a combination of multiple networks and may use a variety of different communication protocols.
[0060] The processing resource 402 and the computer readable medium 404 may also be coupled to requested data sources 410 through the communication link 406, and/or to communication devices 412 over the network 408. The coupling with the requested data sources 410 enables in receiving the requested data in an offline environment, and the coupling with the communication devices 412 enables in receiving the requested data in an online environment.
[0061] In one implementation, the computer readable medium 404 includes a set of computer readable instructions, implementing a synchronizing module 414 and a query database update module 416. The set of computer readable instructions can be accessed by the processing resource 402 through the communication link 406 and subsequently executed to process requested data communicated with the requested data sources 410 in order to facilitate file system journaling. When executed by processing resource 402, the instructions of the synchronizing module 414 may perform the functionalities described above in relation to the synchronizing module 106. When executed by processing resource 402, the instructions of the query database update module 416 may perform the functionalities described above in relation to the query database update module 108.
[0062] For example, in response to a synchronization request, the synchronizing module 414 may scan a replica query database associated with a replica computing system to identify a last file operation timestamp. The last file operation timestamp is a timestamp corresponding to a unique Identifier (ID) of a file whose file object attributes were recorded as a last updated entry in the replica query database.
[0063] The synchronizing module 414 may subsequently identify a recent unique ID from a unique ID list based on the last file operation timestamp and a timestamp of the recent unique ID. The timestamp of the recent unique ID is one of greater than and equal to the last file operation timestamp. The unique ID list includes unique IDs and timestamps of files updated in a source computing system. The unique ID is associated with a file that has been updated at a source computing system, while the timestamp indicates a time instant at which the file was updated. [0064] Further, the query database update module 416 may query a replica file system of the replica computing system to determine if the file and the file object attributes corresponding to the recent unique ID are present in the replica file system. If the file and the file object attributes are present in the replica file system, the query database update module 416 may obtain the file object attributes corresponding to the file. The query database update module 416 may subsequently update the replica query database to include the file object attributes having the most recent updates for the recent unique ID.
[0065] If the file and the file object attributes are not present in the replica file system, the query database update module 416 may determine the file to be deleted. The query database update module 416 may delete the recent unique ID, the file object attributes corresponding to the file, as saved in the replica query database, and the last updated timestamp, from the replica query database.
[0066] Although examples for the present disclosure have been described in language specific to structural features and/or methods, it should stood that the appended claims are not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed and explained as examples of the present disclosure.

Claims

I/We claim:
1 . A replica computing system comprising:
a processor;
a synchronizing module coupled to the processor to:
scan a replica query database associated with the replica computing system to identify a last file operation timestamp, wherein the last file operation timestamp is a timestamp corresponding to a unique
Identifier (ID) of a file whose file object attributes were recorded as a last updated entry in the replica query database; and
identify, from a unique ID list, a recent unique ID, wherein a timestamp of the recent unique ID is one of greater than and equal to the last file operation timestamp, wherein the unique ID list includes unique IDs of files updated in a source computing system and timestamps corresponding to each unique ID; and
a query database update module coupled to the processor to:
update the replica query database to include the recent unique
ID and file object attributes having the most recent updates for a file corresponding to the recent unique ID.
2. The computing system as claimed in claim 1 , wherein the query database update module further is to query a replica file system of the replica computing system to obtain the file object attributes having the most recent updates for the file corresponding to the recent unique ID.
3. The computing system as claimed in claim 2, wherein the query database update module further is to:
for the file and the file object attributes not present in the replica file system, determine the file to be deleted; and delete, from the replica query database, the recent unique ID, the file object attributes corresponding to the file, and a timestamp corresponding to the recent unique ID.
4. The computing system as claimed in claim 1 , wherein the synchronizing module further is to:
compare the last file operation timestamp with the timestamp corresponding to each unique ID present in the unique ID list to determine the recent unique ID.
5. A method comprising:
updating, at a source computing system, a source query database to include file object attributes of a file updated during a file operation;
updating a unique Identifier (ID) list to include a unique ID of the file being updated and a timestamp corresponding to the unique ID, wherein the timestamp indicates a time instant at which the file was updated; and
transmitting a snapshot of a source file system, to a replica computing system for creating a replica of the source file system and the source query database, wherein the snapshot includes files saved in the source file system, the source query database, and the unique ID list.
6. The method as claimed in claim 5 further comprising:
generating the snapshot of the source file system, wherein the snapshot includes files saved in the source file system, the source query database, and the unique ID list since generation of a previous snapshot.
7. The method as claimed in claim 5 further comprising: obtaining, at the replica computing system, the snapshot from the source computing system; and
saving the snapshot in the replica computing system.
8. The method as claimed in claim 7 further comprising:
at the replica computing system, scanning a replica query database of the replica computing system to identify a last file operation timestamp; wherein the last file operation timestamp is a timestamp corresponding to a unique Identifier (ID) of a file whose file object attributes were recorded as a last updated entry in the replica query database;
identifying a recent unique ID from the unique ID list based on the last file operation timestamp and a timestamp corresponding to the recent unique ID, wherein the last file operation timestamp is less than the timestamp corresponding to the recent unique ID; and
updating the replica query database to include the file object attributes obtained from a replica file system of the replica computing system, having the most recent updates for the recent unique ID.
9. The method as claimed in claim 8, wherein the updating comprises one of: for the unique ID being present in the replica query database, updating existing file object attributes for the recent unique ID; and
for the unique ID not present in the replica query database, adding the file object attributes for the recent unique ID in the replica query database.
10. The method as claimed in claim 8, wherein the updating the replica query database further comprising: querying a replica file system of the replica computing system to determine if the file and the file object attributes corresponding to the file are present in the replica file system; and
obtaining the file object attributes corresponding to the file if the file and the file object attributes are present in the replica file system.
11. The method as claimed in claim 8 further comprising;
querying a replica file system of the replica computing system to determine if the file and the file object attributes corresponding to the file are present in the replica file system;
for the file and the file object attributes not present in the replica file system, determining the file to be deleted; and
deleting, from the replica query database, the recent unique ID, the file object attributes corresponding to the file, and a timestamp corresponding to the unique ID.
12. The method as claimed in claim 8, wherein identifying the unique ID having the timestamp greater than the last file operation timestamp further comprises comparing the last file operation timestamp with each unique ID present in the unique ID list.
13. A non-transitory computer readable medium having a set of computer readable instructions that, when executed, cause a processor to:
scan a replica query database associated with a replica computing system to identify a last file operation timestamp; wherein the last file operation timestamp is a timestamp corresponding to a unique Identifier (ID) of a file whose file object attributes were recorded as a last updated entry in the replica query database; and identify, from a unique ID list, a recent unique ID, wherein a timestamp of the recent unique ID is one of greater than and equal to the last file operation timestamp, wherein the unique ID list includes unique IDs of files updated in a source computing system and timestamps corresponding to each unique ID; query a replica file system of the replica computing system to determine if file and file object attributes corresponding to the recent unique ID are present in the replica file system;
for the file and the file object attributes not present in the replica file system, determine the file to be deleted in a source computing system; and delete, from the replica query database, the unique ID, the file object attributes corresponding to the file, and a timestamp corresponding to the recent unique ID.
14. The non-transitory computer readable medium of claim 13, wherein the computer readable instructions, when executed, further cause the processor to:
obtain the file object attributes corresponding to the file if the file and the file object attributes are present in the replica file system; and
update the replica query database to include the file object attributes obtained from the replica file system, having the most recent updates for the recent unique ID.
15. The non-transitory computer readable medium of claim 13, wherein the computer readable instructions, when executed, further cause the processor to:
obtain, at the replica computing system, a snapshot from the source computing system for synchronizing the replica computing system with the source computing system; and save the snapshot in the replica computing system.
PCT/US2015/023649 2015-01-29 2015-03-31 File system replication WO2016122690A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN424/CHE/2015 2015-01-29
IN424CH2015 2015-01-29

Publications (1)

Publication Number Publication Date
WO2016122690A1 true WO2016122690A1 (en) 2016-08-04

Family

ID=56544098

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2015/023649 WO2016122690A1 (en) 2015-01-29 2015-03-31 File system replication

Country Status (1)

Country Link
WO (1) WO2016122690A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060168120A1 (en) * 2001-11-27 2006-07-27 Microsoft Corporation Non-invasive latency monitoring in a store-and-forward replication system
US20070255763A1 (en) * 2006-04-27 2007-11-01 International Business Machines Corporation Database replication method and system
JP2009116722A (en) * 2007-11-08 2009-05-28 Nec Corp Storage device and method of adding timestamp
US20090157537A1 (en) * 2007-10-30 2009-06-18 Miller Barrick H Communication and synchronization in a networked timekeeping environment
US8849777B1 (en) * 2011-06-30 2014-09-30 Emc Corporation File deletion detection in key value databases for virtual backups

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060168120A1 (en) * 2001-11-27 2006-07-27 Microsoft Corporation Non-invasive latency monitoring in a store-and-forward replication system
US20070255763A1 (en) * 2006-04-27 2007-11-01 International Business Machines Corporation Database replication method and system
US20090157537A1 (en) * 2007-10-30 2009-06-18 Miller Barrick H Communication and synchronization in a networked timekeeping environment
JP2009116722A (en) * 2007-11-08 2009-05-28 Nec Corp Storage device and method of adding timestamp
US8849777B1 (en) * 2011-06-30 2014-09-30 Emc Corporation File deletion detection in key value databases for virtual backups

Similar Documents

Publication Publication Date Title
CN110249321B (en) System and method for capturing change data from a distributed data source for use by heterogeneous targets
US20210224166A1 (en) Database snapshot and backup management with recoverable chains
WO2019154394A1 (en) Distributed database cluster system, data synchronization method and storage medium
WO2017177941A1 (en) Active/standby database switching method and apparatus
US20210326220A1 (en) Scaling single file snapshot performance across clustered system
CN111221678B (en) Hbase data backup/recovery system, method and device and electronic equipment
US11675741B2 (en) Adaptable multi-layered storage for deduplicating electronic messages
US11392460B2 (en) Adaptable multi-layer storage with controlled restoration of protected data
US11194669B2 (en) Adaptable multi-layered storage for generating search indexes
JP5686034B2 (en) Cluster system, synchronization control method, server device, and synchronization control program
EP3722973A1 (en) Data processing method and device for distributed database, storage medium, and electronic device
US11681586B2 (en) Data management system with limited control of external compute and storage resources
US11080142B2 (en) Preservation of electronic messages between snapshots
US20230333945A1 (en) Scalable Low-Loss Disaster Recovery for Data Stores
US11079960B2 (en) Object storage system with priority meta object replication
US11042454B1 (en) Restoration of a data source
CN111522688B (en) Data backup method and device for distributed system
US11093465B2 (en) Object storage system with versioned meta objects
CN114490570A (en) Production data synchronization method and device, data synchronization system and server
US11074002B2 (en) Object storage system with meta object replication
WO2016122690A1 (en) File system replication
US20210248108A1 (en) Asynchronous data synchronization and reconciliation
US9747166B2 (en) Self healing cluster of a content management system
CN117063160A (en) Memory controller for shared memory access and method for use in memory controller
CN116340425A (en) Data management method, device, medium and computing equipment of MHA (mobile high-definition architecture)

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15880579

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15880579

Country of ref document: EP

Kind code of ref document: A1