WO2017041616A1 - Data reading and writing method and device, double active storage system and realization method thereof - Google Patents

Data reading and writing method and device, double active storage system and realization method thereof Download PDF

Info

Publication number
WO2017041616A1
WO2017041616A1 PCT/CN2016/095865 CN2016095865W WO2017041616A1 WO 2017041616 A1 WO2017041616 A1 WO 2017041616A1 CN 2016095865 W CN2016095865 W CN 2016095865W WO 2017041616 A1 WO2017041616 A1 WO 2017041616A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
storage
storage magnetic
write
lun
Prior art date
Application number
PCT/CN2016/095865
Other languages
French (fr)
Chinese (zh)
Inventor
闫海涛
曾理文
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2017041616A1 publication Critical patent/WO2017041616A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/068Hybrid storage device

Definitions

  • This application relates to, but is not limited to, the field of communications.
  • the disaster recovery solution in the related art is generally a single data center, and the storage in the data center is served by the storage magnetic array.
  • the storage magnetic array is composed of a storage controller and a disk.
  • the storage controller is dual-control, and the back end of the storage controller is a plurality of disks that are visible to the dual control.
  • the dual controller automatically takes over the other node to achieve the purpose of combating the failure of the single-point storage controller.
  • the asynchronous recovery takes a long time and the data non-real-time synchronous backup will have some data loss.
  • Another solution in the related art is to set up two data centers.
  • the primary data center data is backed up to the standby data center in real time, but the backup data center cannot provide services. After the disaster occurs, it needs to manually switch to the standby data center, and the recovery time cannot be met. Do not interrupt the needs of the business.
  • the standby site In the active/standby mode, the standby site cannot provide services, which wastes resources, and the user's data needs to be successfully written by both data centers.
  • the number of read/write operations per second Input/Output Operations Per Second, IOPS for short) There is a certain decline.
  • An embodiment of the present invention provides a data reading and writing method, including: determining two or more storage magnetic arrays for reading and writing data; and reading and writing data by using the two or more storage magnetic arrays.
  • the reading and writing of the data by the two or more storage magnetic arrays includes: receiving a data write request for requesting to write the first predetermined data; and writing the first predetermined data to the two storages according to the data write request Magnetic array.
  • writing the first predetermined data to the two storage magnetic arrays includes: determining that the first storage magnetic array write data is successful in the two storage magnetic arrays, and the second storage magnetic array writing data fails; And storing the same storage block in the two storage magnetic arrays of the first predetermined data; if the second storage magnetic matrix returns to normal, according to the same storage block, the first storage The data mirror successfully written by the magnetic array is synchronized into the second storage magnetic array.
  • the method further includes: stopping writing to the second storage magnetic array The first predetermined data is described.
  • recording the same one of the two storage magnetic arrays for storing the first predetermined data comprises: recording the storage block by means of a bitmap bitmap identification, and/or, by using a log The storage block is recorded in a recorded manner.
  • performing data reading and writing by using the two or more storage magnetic arrays includes: receiving a data read request for requesting to read the second predetermined data; and receiving a third storage magnetic of the two or more storage magnetic arrays. And a portion of the second predetermined data returned by the array and the second predetermined data returned by the fourth of the two or more storage magnetic arrays, except for the partial data.
  • determining two or more storage magnetic arrays for reading and writing data includes: creating two or more new storage magnetic arrays; or upgrading the existing predetermined storage magnetic array to a dual-active storage magnetic array, wherein The active-active storage magnetic array includes the predetermined storage magnetic array and a replicated storage magnetic array obtained after copying the predetermined storage magnetic array.
  • the embodiment of the present invention further provides a data reading and writing device, wherein: the determining module is configured to: determine two or more storage magnetic arrays for reading and writing data; and the reading and writing module is configured to: utilize the two More than one storage magnetic array for reading and writing data.
  • the read/write module includes: a first receiving unit, configured to: when the two or more storage magnetic arrays are two storage magnetic arrays, receive for requesting to write the first predetermined data Number Writing a request; the writing unit is configured to: write the first predetermined data into the two storage magnetic arrays according to the data write request.
  • the writing unit includes: determining a subunit, configured to: determine that the first storage magnetic array write data is successful in the two storage magnetic arrays, the second storage magnetic array write data fails; the recording subunit, setting And: recording the same storage block in the two storage magnetic arrays for storing the first predetermined data; the synchronization subunit is configured to: when the second storage magnetic matrix returns to normal, according to the The same memory block synchronizes the data mirror successfully written by the first storage magnetic array into the second storage magnetic matrix.
  • the writing unit further includes: a stopping subunit, configured to: after determining that the first storage magnetic array write data is successful in the two storage magnetic arrays, and the second storage magnetic array fails to write data, Stop writing the first predetermined data to the second storage magnetic matrix.
  • a stopping subunit configured to: after determining that the first storage magnetic array write data is successful in the two storage magnetic arrays, and the second storage magnetic array fails to write data, Stop writing the first predetermined data to the second storage magnetic matrix.
  • the recording subunit records the same storage block in the two storage magnetic arrays for storing the first predetermined data by recording the storage block by means of a bitmap bitmap identification, and / or, the storage block is recorded by means of logging.
  • the read/write module includes: a second receiving unit, configured to: receive a data read request for requesting to read the second predetermined data; and the third receiving unit is configured to: receive the two or more storages Except for the partial data in the second predetermined data returned by the third storage magnetic matrix in the magnetic array and the second predetermined data returned by the fourth storage magnetic array in the two or more storage magnetic arrays Data outside of some data.
  • the determining module includes: a creating unit, configured to: create two or more new storage magnetic arrays; or, an upgrading unit, configured to: upgrade the existing predetermined storage magnetic array to a dual-active storage magnetic array,
  • the active-active storage magnetic array includes the predetermined storage magnetic array and a duplicate storage magnetic array obtained after copying the predetermined storage magnetic array.
  • the above solution adopts two or more storage magnetic arrays for reading and writing data; using the two or more storage magnetic arrays for data reading and writing, solving the problem of waste of resources and low data reading and writing efficiency in the related art, In turn, the effect of avoiding resource waste and improving data reading and writing efficiency is achieved.
  • An embodiment of the present invention further provides an active-active storage system, including interconnecting and providing services at the same time.
  • Two data centers of the service each data center includes at least one storage magnetic array and one user server, and a user server of one data center is connected with the storage magnetic array of the two data centers, and the storage of the two data centers
  • the magnetic arrays are connected through a backend network;
  • the user server is configured to: distribute the write request of the user to the storage magnetic array of the two data centers at the same time, and return to the user for successful writing when both the primary LUN and the secondary LUN are successfully written;
  • the storage magnetic array is configured to perform a write data operation after receiving the write request.
  • the two data centers are located off-site, and the two data centers are connected to each other through an optical fiber and a fiber switch.
  • the dual-active storage system stores two real logical unit number LUNs, one is a primary LUN, corresponding to a storage magnetic array of one data center, and the other is a secondary LUN corresponding to a storage magnetic array of another data center. Present the primary LUN to the user.
  • the storage magnetic array includes a dual-control storage controller and a magnetic disk, and the dual-control storage controllers in the storage magnetic array of the two data centers form a cluster to form a control body of the dual-active storage system.
  • the two data centers communicate with each other through the cluster channel.
  • the active-active storage system further includes one or more of the following modules:
  • the synchronous mirroring module is configured to mirror the data of the primary LUN in the background by the disk block until the data is mirrored to the secondary LUN.
  • the write processing module is configured to: when one write succeeds and the other write fails in the primary LUN and the secondary LUN, record the information of the successfully written storage block, and after the failed LUN failure recovery, pass the synchronous mirror The module synchronizes data of the storage block to the LUN that failed to write;
  • the cluster decision module is configured to: when one of the primary LUN and the secondary LUN is successfully written and the other fails, the LUN that fails to be written is set to be unavailable, and the LUN fails to be restored after the write fails, and the data synchronization mirror is restored. The service of the failed LUN is written; and/or, when multiple cluster members have errors at the same time, the operation change between the clusters is achieved through the cluster transaction manner;
  • the arbitration device is configured to: after the cluster communication is abnormal, the arbitration module acts as an arbitration server, and determines, by voting, a member that continues to provide services in the cluster; or the arbitration module acts as an IP storage local area network device, and is occupied by resource contention. Most resource providers continue to provide services.
  • the embodiment of the invention further provides an implementation method of a dual-active storage system, including:
  • each data center includes at least one storage magnetic array and one user server, and a user server of one data center is connected with a storage magnetic array of the two data centers,
  • the storage magnetic arrays of the two data centers are connected by a back-end network;
  • the user server After receiving the write request of the user, the user server simultaneously distributes the write request to the storage magnetic array of the two data centers, and when the storage magnetic arrays of the two data centers are successfully written, the user returns to the user. Successfully written;
  • the storage magnetic array After receiving the write request, the storage magnetic array performs a write data operation.
  • the method further includes:
  • the storage LUN creates a secondary LUN. After the data of the primary LUN is mirrored to the secondary LUN successfully, the secondary LUN is served.
  • the active-active storage system presents the primary LUN to a user.
  • the method further includes: after the primary LUN and the secondary LUN receive the write request and perform a data write operation, if one of the writes succeeds and the other write fails, the information of the successfully written storage block is recorded. And the LUN that fails to be written is set to be unavailable. After the failed LUN failure is restored, the data of the storage block is synchronized to the LUN that fails to be written, and the LUN that fails to be written is restored after the data synchronization mirroring is completed. service.
  • the storage magnetic array includes a dual-control storage controller and a magnetic disk, and the dual-control storage controllers in the storage magnetic array of the two data centers form a cluster to form a control body of the dual-active storage system. Transmitting and sending messages between the two data centers through the cluster channel;
  • the method also includes one or more of the following:
  • Cluster decision module When multiple cluster members have errors at the same time, the cluster operations are used to achieve the same operation change between clusters.
  • two data centers can simultaneously provide service services, and the efficiency is greatly improved.
  • FIG. 1 is a flowchart of a data reading and writing method according to Embodiment 1 of the present invention.
  • FIG. 2 is a block diagram showing the structure of a data reading and writing apparatus according to a first embodiment of the present invention
  • FIG. 3 is a block diagram 1 of a structure of a read/write module 24 in a data read/write device according to a first embodiment of the present invention
  • FIG. 4 is a block diagram showing the structure of a writing unit 34 in a data reading and writing device according to a first embodiment of the present invention
  • FIG. 5 is a block diagram showing a preferred structure of the writing unit 34 in the data reading and writing apparatus according to the first embodiment of the present invention
  • FIG. 6 is a block diagram 2 of a structure of a read/write module 24 in a data read/write device according to a first embodiment of the present invention
  • FIG. 7 is a block diagram showing the structure of the determining module 22 in the data reading and writing apparatus according to the first embodiment of the present invention.
  • FIG. 8 is a structural block diagram of an active-active storage system according to a second embodiment of the present invention.
  • FIG. 9 is a flowchart of creating a LUN, dual-active data distribution, and exception processing in a dual-active storage system according to an embodiment of the present invention.
  • FIG. 1 is a flowchart of a data reading and writing method according to an embodiment of the present invention. As shown in FIG. 1 , the process includes the following steps:
  • Step S102 determining two or more storage magnetic arrays for reading and writing data
  • step S104 data is read and written by using the two or more storage magnetic arrays.
  • the resource efficiency of each storage magnetic array can be improved under the premise of ensuring that each storage magnetic array is backed up, and data storage is provided by using multiple storage magnetic arrays.
  • Write service can also improve the efficiency of data reading and writing, thus solving the problem of waste of resources and low efficiency of data reading and writing in related technologies, thereby achieving the effect of avoiding resource waste and improving data reading and writing efficiency.
  • data reading and writing by using the two or more storage magnetic arrays includes: receiving for requesting writing a data write request of a predetermined data; according to the data write request, the first predetermined data is written to the two storage magnetic arrays.
  • writing the first predetermined data to the two storage magnetic arrays comprises: determining that the first storage magnetic array write data is successful in the two storage magnetic arrays, and the second storage magnetic array writing data fails Recording the same storage block in the two storage magnetic arrays for storing the first predetermined data; in the case where the second storage magnetic matrix returns to normal, writing the first storage magnetic matrix according to the same storage block as described above Successful data mirroring is synchronized to the second storage magnetic array.
  • the method further includes: stopping writing to the second storage magnetic array First predetermined data. That is, after the data cannot be successfully written in the second storage magnetic matrix, it may indicate that the second storage magnetic matrix may be faulty, or the disk currently used for writing data may be faulty, resulting in the second storage magnetic matrix.
  • the normal storage data read/write service cannot be provided.
  • the second storage magnetic array can be placed in an unavailable state, and after the second storage magnetic array returns to normal, the second storage magnetic array is used to perform data read and write services. Thereby effectively ensuring the correct rate of data reading and writing.
  • recording the same storage block in the two storage magnetic arrays for storing the first predetermined data includes: recording the storage block by means of bitmap bitmap identification, and/or, by logging The way to record the above memory block.
  • performing data reading and writing by using the two or more storage magnetic arrays includes: receiving a data read request for requesting to read the second predetermined data; and receiving the two or more storage magnetic arrays. And a portion of the second predetermined data returned by the third storage magnetic matrix and the second predetermined data returned by the fourth storage magnetic matrix of the two or more storage magnetic arrays other than the partial data.
  • the third storage magnetic array may be one or more storage magnetic arrays
  • the fourth storage magnetic array may also be one or more storage magnetic arrays, through the third storage magnetic array and the fourth storage magnetic layer.
  • the array also provides data reading services, which can improve data reading efficiency and save data reading time.
  • determining two or more storage magnetic arrays for reading and writing data includes: creating two or more new storage magnetic arrays; or The existing predetermined storage magnetic array is upgraded to an active-active storage magnetic array, wherein the active-active storage magnetic array includes the predetermined storage magnetic array described above and a replicated storage magnetic array obtained by replicating the predetermined storage magnetic array.
  • the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware, but in many cases, the former is A better implementation.
  • the technical solutions of the embodiments of the present invention may be embodied in the form of software products in essence or in the form of a software product stored in a storage medium (such as ROM/RAM,
  • the disk, the optical disk includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods described in the various embodiments.
  • a data reading and writing device is also provided, which is used to implement the above-mentioned embodiments and preferred embodiments, and has not been described again.
  • the term "module” may implement a combination of software and/or hardware of a predetermined function.
  • the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
  • FIG. 2 is a block diagram showing the structure of a data read/write device according to an embodiment of the present invention. As shown in FIG. 2, the device includes a determination module 22 and a read/write module 24, which will be described below.
  • the determining module 22 is configured to: determine two or more storage magnetic arrays for reading and writing data;
  • the read/write module 24 is connected to the determining module 22, and is configured to perform data reading and writing by using the two or more storage magnetic arrays.
  • FIG. 3 is a block diagram showing the structure of a read/write module 24 in a data read/write device according to an embodiment of the present invention.
  • the read/write module 24 includes a first receiving unit 32 and a writing unit 34. The read/write module 24 will be described.
  • the first receiving unit 32 is configured to: when the two or more storage magnetic arrays are two storage magnetic arrays, receive a data write request for requesting to write the first predetermined data;
  • the writing unit 34 is connected to the first receiving unit 32, and is configured to write the first predetermined data into the two storage magnetic arrays according to the data writing request.
  • FIG. 4 is a block diagram showing the structure of a write unit 34 in a data read/write apparatus according to an embodiment of the present invention.
  • the write unit 34 includes a determination subunit 42, a recording subunit 44, and a sync subunit 46.
  • the writing unit 34 will be described.
  • the determining subunit 42 is configured to: determine that the first storage magnetic array in the two storage magnetic arrays successfully writes data, and the second storage magnetic array fails to write data;
  • a recording subunit 44 coupled to the determining subunit 42, configured to: record the same one of the two storage magnetic arrays for storing the first predetermined data
  • the synchronization subunit 46 is connected to the recording subunit 44, and is configured to synchronize the data mirror successfully written by the first storage magnetic array according to the same storage block when the second storage magnetic matrix is restored to the normal state. In the above second storage magnetic array.
  • FIG. 5 is a block diagram of a preferred structure of the writing unit 34 in the data reading and writing device according to an embodiment of the present invention. As shown in FIG. 5, the writing unit 34 includes a stop subunit 52 in addition to the unit shown in FIG. 4, and the write unit 34 will be described below.
  • the stop subunit 52 is connected to the determining subunit 42 to be configured to: after determining that the first storage magnetic array write data is successful in the two storage magnetic arrays, and the second storage magnetic array write data fails, stopping to the second storage magnetic field
  • the array writes the first predetermined data described above.
  • the recording sub-unit 44 may record the same storage block in the two storage magnetic arrays for storing the first predetermined data by recording the storage block by means of a bitmap bitmap identification. And/or, the above storage block is recorded by means of logging.
  • FIG. 6 is a block diagram showing the structure of the read/write module 24 in the data read/write device according to the embodiment of the present invention. As shown in FIG. 6, the read/write module 24 includes a second receiving unit 62 and a third receiving unit 64. The read/write module 24 will be described.
  • the second receiving unit 62 is configured to: receive a data read request for requesting to read the second predetermined data;
  • the third receiving unit 64 is connected to the second receiving unit 62, and configured to: receive partial data in the second predetermined data returned by the third storage magnetic array of the two or more storage magnetic arrays, and the two or more And storing data other than the partial data in the second predetermined data returned by the fourth storage magnetic matrix in the magnetic array.
  • FIG. 7 is a block diagram showing the structure of the determining module 22 in the data reading and writing apparatus according to the embodiment of the present invention. As shown in FIG. 7, the determining module 22 includes a creating unit 72 or an upgrading unit 74. The determining module 22 will be described below.
  • the creating unit 72 is configured to: create more than two new storage magnetic arrays
  • the upgrading unit 74 is configured to: upgrade the existing predetermined storage magnetic array to an active-active storage magnetic array, wherein the dual-active storage magnetic array comprises the predetermined storage magnetic array and the duplicate storage magnetic magnetic material obtained after copying the predetermined storage magnetic array Array.
  • an active-active storage system including two data centers (each of which includes a storage magnetic array), which can simultaneously provide service services, and can be connected at the same time.
  • the efficiency of the load balancing mechanism is greatly improved.
  • the data center data is completely real-time consistent, and the site resources can take over each other. No matter whether the single-point magnetic array controller fault or the disk fault occurs, the user service will not be taken. Caused an interruption.
  • an embodiment of the present invention provides a two-site dual-active storage system, where the system includes two data centers.
  • Each data center includes a set of storage magnetic arrays and a number of fiber switches.
  • the storage magnetic array includes dual-control storage controllers and disks.
  • the dual-control storage controller is composed of two boards that are highly available and can be taken over each other. Each board is a node.
  • the two data centers can be connected using fiber optics, and the two dual-control memory controllers of each storage magnetic array form a cluster to form a dual-active controller body.
  • the cluster needs an arbitration device to prevent the resource contention caused by brain splitting after the cluster communication is abnormal.
  • the storage magnetic array in the active-active storage system, is the main body for storing user data.
  • the storage magnetic array provides a logical unit number (Logical Unit Numbe, LUN for short) access service through a small computer system interface (iscsi) or a Fibre Channel (fc).
  • the dual-control memory controllers in the two storage magnetic arrays form a cluster.
  • the storage controller is used for dual-active core control devices, responsible for active-active cluster management, as well as volume management, and synchronous mirroring.
  • the fiber switch, the user access front end of the dual-control storage controller of the two places and the back-end network connection of the internal data transmission use optical fibers to achieve low delay.
  • Arbitration equipment usually the arbitration equipment is placed in a third location, which can be an arbitration server to vote to determine which members of the cluster can continue to provide services. It can also be a storage network IP SAN (Storage Area Network) device built on an IP network. Through resource competition, those who occupy most resources continue to provide services.
  • IP SAN Storage Area Network
  • a two-site dual-active storage system and an implementation method wherein two real LUNs are stored in the system, and only one active LUN is visible to the user.
  • the two real LUNs one of which is the primary LUN, and the other is the secondary LUN.
  • the two real LUNs are each provided by one of the two storage magnetic arrays.
  • the synchronous mirroring module mirrors the data of the primary LUN in the background by the disk block until the data is mirrored to the secondary LUN. During this period, the secondary LUN cannot provide services. After all the data is mirrored to the secondary LUN, the system will identify the secondary LUN and provide access. Subsequent user write operations will be distributed to the primary and secondary LUNs at the same time. When the primary and secondary LUNs are successfully written, the operation result can be fed back to the user to achieve the purpose of consistent data at both ends.
  • the storage magnetic arrays of both data centers can provide user access services for the same LUN to improve system utilization and improve IOPS.
  • the user server When receiving the write request of the user, the user server performs simultaneous distribution of the primary and secondary LUNs, that is, simultaneously distributes to the storage magnetic arrays of the two data centers, and the storage magnetic array performs the write data operation after receiving the write request.
  • the write data operation of the storage magnetic array of only two data centers is successful, and the user server returns the user success to ensure that the data of the primary and secondary LUNs are consistent in real time.
  • the user After the user writes the primary and secondary LUNs abnormally, for example, if the secondary LUN fails to be written, the user's write write operation will be recorded. You can use the bitmap to identify which block is currently written, or log the operation. And identify that the secondary LUN is unavailable, to avoid data errors caused by reading the secondary LUN. After the subsequent exception is removed, the system triggers the recovery of the secondary LUN data.
  • the synchronous mirroring module starts to replay the bitmap, and the changed block identified in the bitmap is read from the primary LUN and written to the secondary LUN. The data will be consistent and the secondary LUN will begin to provide services. If the log is used, the data of the primary and secondary LUNs are consistent and the secondary LUN starts to provide services.
  • a cluster decision module determines the operation result of the write operation and changes the available state of the primary and secondary LUNs to avoid the obtained information between the clusters and the multiple nodes. Problems that cannot be strictly consistent in real time.
  • the cluster decision module can achieve the same operation change between clusters through cluster transactions. When multiple cluster members have errors at the same time, this mechanism can effectively prevent the error notification sequence from being executed by the cluster decision module. When you get the right information in time.
  • a third-party arbitration device which can be an IP SAN device, decides the section that can continue to provide services by competing for the device. Point, it can also be a node that the arbitration server votes to continue to provide services.
  • FIG. 8 is a block diagram showing the overall structure of an active-active system according to an embodiment of the present invention.
  • the entire active-active system is deployed in two data centers, namely, data center A and data center B.
  • Each data center includes a number of entities, such as data center A, including user server A (which is the same as the first receiving unit 32 and the second receiving unit 62 and the third receiving unit 64 described above).
  • User server A is connected to storage magnetic array A through optical fibers and FC switches.
  • the storage magnetic array A includes a dual-control storage controller A (same as the write unit 34 described above), the dual-control storage controller A is connected to the disk A, and the disk A refers to the entirety of one or more disks.
  • the user server A and the user server B of the two data centers communicate with each other through the FC switch, and the dual-control storage controllers of the data center A and the data center B and the access network of the user communicate with each other through the FC switch.
  • the back-end network of the two dual-control storage controllers in the data center is used for the transmission of internal data, such as the synchronous mirror channel in this application, and is also connected using FC switches.
  • Two data centers need to have cluster channels for cluster messaging and heartbeat detection.
  • a third-party arbitration is required between clusters to prevent brain splitting problems caused by abnormalities in the cluster channel.
  • FIG. 9 is a flowchart of creating, dual-active data distribution, and exception processing of a dual-active LUN according to an embodiment of the present invention. As shown in FIG. 9, the process includes the following steps:
  • Step S901 The user initiates the creation of a dual active LUN or upgrades one LUN to a dual active LUN.
  • the active LUN includes two objects: a primary LUN and a secondary LUN.
  • the active LUN that is presented to the user, that is, the user sees is the primary LUN.
  • the secondary LUN is not presented externally.
  • Step S902 If a dual active LUN is created, a request for creating a primary LUN and a secondary LUN is initiated to the storage magnetic arrays of the two locations. If the LUN is upgraded to dual-active, the LUN becomes the primary LUN and a request to create a secondary LUN is initiated to the storage magnetic array of another data center. The primary LUN is presented to the user.
  • Step S903 In the background, the synchronous mirroring module copies all the data of the primary LUN to the secondary LUN. At this time, the data of the primary and secondary LUNs are completely consistent, and the secondary LUN starts to provide access.
  • the primary and secondary LUNs After a dual-active LUN is created, if it is a newly created active-active LUN, the primary and secondary LUNs have no data and can be accessed immediately. If the LUN is upgraded to dual-active, the primary LUN may have data. Only the data of the primary LUN can be mirrored to the secondary LUN in the background. After that, the secondary LUN can provide access.
  • Step S904 Each node of the active-active storage system can provide services, and the write IO is simultaneously distributed to the primary and secondary LUNs.
  • the IO of the user's write request is simultaneously distributed to the primary and secondary LUNs, so that the data of the primary and secondary LUNs is completely consistent.
  • Step S905 returning to the user for successful writing, and continuing to wait for the user's read and write request.
  • step S904 If the write IO is simultaneously distributed to the primary and secondary LUNs in step S904, if the primary and secondary LUNs all return a successful write, the user returns a successful write and continues to wait for the user's read and write requests. This ensures that the primary and secondary LUN data is consistent in real time. The next user IO request is the same as step S904. If the user IO primary and secondary LUNs are successfully written, the entire active-active system continues to be healthy.
  • Step S906 If the write fails, the return user fails. If there is a success, the cluster decision module is notified, and the cluster decision module sets the failed LUN to be unavailable. For example, the secondary LUN fails to write and the secondary LUN is unavailable.
  • the cluster decision module of the active-active system needs to be notified, and the cluster decision module sets the failed LUN to be unavailable.
  • the cluster decision module can achieve the same operation change between clusters through cluster transactions. When multiple cluster members have errors at the same time, this mechanism can effectively prevent the error notification sequence from being executed by the cluster decision module. When you get the right information in time.
  • Step S907 The system uses the bitmap to identify the data block of the current change, or records the information of the current change by using a log.
  • the user's write write operation will be recorded. For example, if the write secondary LUN fails, the bitmap can be used to identify which block is currently written, and the operation can be logged. It is used as the basis for data recovery of subsequent sub-LUNs.
  • Step S908 After the write of the secondary LUN is restored, the recorded information of the system fails to be played back, and the data in the primary LUN is mirrored to the secondary LUN until all is completed.
  • the secondary LUN can continue to provide services.
  • the primary and secondary LUNs of the active-active system can continue to provide services.
  • the user writes the request IO, it will be the same as step S904.
  • the synchronous mirroring module can be set in the dual-control storage controller of the primary LUN and automatically set when a dual-active LUN is created.
  • the cluster decision module can be set in each user server of the cluster to form a distributed system, which is automatically set when a dual active LUN is created.
  • each of the above modules may be implemented by software or hardware.
  • the foregoing may be implemented by, but not limited to, the foregoing modules are all located in the same processor; or, the modules are located in multiple In the processor.
  • Embodiments of the present invention also provide a storage medium.
  • the foregoing storage medium may be configured to store program code for performing the following steps:
  • the foregoing storage medium may include, but is not limited to, a USB flash drive, a Read-Only Memory (ROM), and a Random Access Memory (RAM).
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • the processor performs the above steps S1-S2 according to the stored program code in the storage medium.
  • all or part of the steps of the above embodiments may also be implemented using an integrated circuit.
  • the steps may be separately fabricated into individual integrated circuit modules, or a plurality of modules or steps may be fabricated into a single integrated circuit module.
  • the devices/function modules/functional units in the above embodiments may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices.
  • the device/function module/functional unit in the above embodiment When the device/function module/functional unit in the above embodiment is implemented in the form of a software function module and sold or used as a stand-alone product, it can be stored in a computer readable storage medium.
  • the above mentioned computer readable storage medium may be a read only memory, a magnetic disk or an optical disk or the like.
  • two or more storage magnetic arrays for reading and writing data are determined; and the data is read and written by using the two or more storage magnetic arrays, thereby solving resource waste and data reading in the related art.
  • the problem of low writing efficiency achieves the effect of avoiding resource waste and improving data reading and writing efficiency.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)
  • Hardware Redundancy (AREA)

Abstract

Provided in the present application are a data reading and writing method and device. The method comprises: determining two or more storage disk arrays for data reading and writing; and performing data reading and writing using the two or more storage disk arrays. Also provided in the present application are a double active storage system and realization method thereof. The present application addresses the problems of resource waste and low efficiency of data reading and writing in the related art, thereby achieving the effects of avoiding resource waste and increasing the efficiency of data reading and writing.

Description

数据读写方法及装置、双活存储***及其实现方法Data reading and writing method and device, dual-active storage system and implementation method thereof 技术领域Technical field
本申请涉及但不限于通信领域。This application relates to, but is not limited to, the field of communications.
背景技术Background technique
相关技术中的容灾解决方案一般是单一的数据中心,数据中心内的存储由存储磁阵提供服务。存储磁阵由存储控制器和磁盘两部分构成。其中,存储控制器为双控,存储控制器后端为双控可见的若干磁盘,当一个存储控制器节点故障后,双控另一节点自动接管达到抗击单点存储控制器故障的目的。通过异步备份的方式备份到另一个地点,以实现异地数据容灾。这种方式一旦数据中心的磁阵特别是磁盘发生灾难损坏,只能依靠上述的异步备份来恢复,异步恢复的时间长并且数据非实时同步备份会有部分数据丢失。相关技术中的另外一种解决方法是设置两个数据中心,主数据中心数据实时备份到备用数据中心,但是备用数据中心不能提供服务,灾难发生后需要手动切换到备用数据中心,恢复时间无法满足不中断业务的需求。主备模式下,备用站点无法提供服务,造成资源浪费,并且用户的数据需要两个数据中心都成功写入,每秒进行读写操作的次数(Input/Output Operations Per Second,简称为IOPS)会有一定的下降。The disaster recovery solution in the related art is generally a single data center, and the storage in the data center is served by the storage magnetic array. The storage magnetic array is composed of a storage controller and a disk. The storage controller is dual-control, and the back end of the storage controller is a plurality of disks that are visible to the dual control. When one storage controller node fails, the dual controller automatically takes over the other node to achieve the purpose of combating the failure of the single-point storage controller. Back up to another location through asynchronous backup to achieve remote data disaster recovery. In this way, once the data center's magnetic array, especially the disk, is catastrophically damaged, it can only rely on the above asynchronous backup to recover. The asynchronous recovery takes a long time and the data non-real-time synchronous backup will have some data loss. Another solution in the related art is to set up two data centers. The primary data center data is backed up to the standby data center in real time, but the backup data center cannot provide services. After the disaster occurs, it needs to manually switch to the standby data center, and the recovery time cannot be met. Do not interrupt the needs of the business. In the active/standby mode, the standby site cannot provide services, which wastes resources, and the user's data needs to be successfully written by both data centers. The number of read/write operations per second (Input/Output Operations Per Second, IOPS for short) There is a certain decline.
针对相关技术中存在的资源浪费以及数据读写效率低的问题,目前尚未提出有效的解决方案。In view of the waste of resources and the low efficiency of data reading and writing in related technologies, no effective solution has been proposed yet.
发明内容Summary of the invention
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。The following is an overview of the topics detailed in this document. This Summary is not intended to limit the scope of the claims.
本发明实施例提供了一种数据读写方法,包括:确定两个以上用于数据读写的存储磁阵;利用所述两个以上的存储磁阵进行数据读写。An embodiment of the present invention provides a data reading and writing method, including: determining two or more storage magnetic arrays for reading and writing data; and reading and writing data by using the two or more storage magnetic arrays.
可选地,在上述两个以上的存储磁阵为两个存储磁阵的情况下,利用上 述两个以上的存储磁阵进行数据读写包括:接收用于请求写入第一预定数据的数据写请求;依据所述数据写请求,将所述第一预定数据写入所述两个存储磁阵。Optionally, in the case that the two or more storage magnetic arrays are two storage magnetic arrays, The reading and writing of the data by the two or more storage magnetic arrays includes: receiving a data write request for requesting to write the first predetermined data; and writing the first predetermined data to the two storages according to the data write request Magnetic array.
可选地,将上述第一预定数据写入所述两个存储磁阵包括:确定所述两个存储磁阵中第一存储磁阵写数据成功,第二存储磁阵写数据失败;记录用于存储所述第一预定数据的所述两个存储磁阵中相同的存储块;在所述第二存储磁阵恢复正常的情况下,依据所述相同的存储块,将所述第一存储磁阵写入成功的数据镜像同步到所述第二存储磁阵中。Optionally, writing the first predetermined data to the two storage magnetic arrays includes: determining that the first storage magnetic array write data is successful in the two storage magnetic arrays, and the second storage magnetic array writing data fails; And storing the same storage block in the two storage magnetic arrays of the first predetermined data; if the second storage magnetic matrix returns to normal, according to the same storage block, the first storage The data mirror successfully written by the magnetic array is synchronized into the second storage magnetic array.
可选地,在确定所述两个存储磁阵中第一存储磁阵写数据成功,所述第二存储磁阵写数据失败之后,还包括:停止向所述第二存储磁阵写入所述第一预定数据。Optionally, after determining that the first storage magnetic array write data is successful in the two storage magnetic arrays, and the second storage magnetic array fails to write data, the method further includes: stopping writing to the second storage magnetic array The first predetermined data is described.
可选地,记录用于存储所述第一预定数据的所述两个存储磁阵中相同的所述存储块包括:利用位图bitmap标识的方式记录所述存储块,和/或,通过日志记录的方式记录所述存储块。Optionally, recording the same one of the two storage magnetic arrays for storing the first predetermined data comprises: recording the storage block by means of a bitmap bitmap identification, and/or, by using a log The storage block is recorded in a recorded manner.
可选地,利用上述两个以上的存储磁阵进行数据读写包括:接收用于请求读取第二预定数据的数据读请求;接收所述两个以上的存储磁阵中的第三存储磁阵返回的所述第二预定数据中的部分数据和所述两个以上的存储磁阵中的第四存储磁阵返回的所述第二预定数据中除所述部分数据外的数据。Optionally, performing data reading and writing by using the two or more storage magnetic arrays includes: receiving a data read request for requesting to read the second predetermined data; and receiving a third storage magnetic of the two or more storage magnetic arrays. And a portion of the second predetermined data returned by the array and the second predetermined data returned by the fourth of the two or more storage magnetic arrays, except for the partial data.
可选地,确定两个以上用于数据读写的存储磁阵包括:创建两个以上新的存储磁阵;或者,将已存在的预定存储磁阵升级为双活存储磁阵,其中,所述双活存储磁阵包括所述预定存储磁阵和复制所述预定存储磁阵后得到的复制存储磁阵。Optionally, determining two or more storage magnetic arrays for reading and writing data includes: creating two or more new storage magnetic arrays; or upgrading the existing predetermined storage magnetic array to a dual-active storage magnetic array, wherein The active-active storage magnetic array includes the predetermined storage magnetic array and a replicated storage magnetic array obtained after copying the predetermined storage magnetic array.
本发明实施例还提供了一种数据读写装置,`其中,包括:确定模块,设置为:确定两个以上用于数据读写的存储磁阵;读写模块,设置为:利用所述两个以上的存储磁阵进行数据读写。The embodiment of the present invention further provides a data reading and writing device, wherein: the determining module is configured to: determine two or more storage magnetic arrays for reading and writing data; and the reading and writing module is configured to: utilize the two More than one storage magnetic array for reading and writing data.
可选地,所述读写模块包括:第一接收单元,设置为:在所述两个以上的存储磁阵为两个存储磁阵的情况下,接收用于请求写入第一预定数据的数 据写请求;写入单元,设置为:依据所述数据写请求,将所述第一预定数据写入所述两个存储磁阵。Optionally, the read/write module includes: a first receiving unit, configured to: when the two or more storage magnetic arrays are two storage magnetic arrays, receive for requesting to write the first predetermined data Number Writing a request; the writing unit is configured to: write the first predetermined data into the two storage magnetic arrays according to the data write request.
可选地,所述写入单元包括:确定子单元,设置为:确定所述两个存储磁阵中第一存储磁阵写数据成功,第二存储磁阵写数据失败;记录子单元,设置为:记录用于存储所述第一预定数据的所述两个存储磁阵中相同的存储块;同步子单元,设置为:在所述第二存储磁阵恢复正常的情况下,依据所述相同的存储块,将所述第一存储磁阵写入成功的数据镜像同步到所述第二存储磁阵中。Optionally, the writing unit includes: determining a subunit, configured to: determine that the first storage magnetic array write data is successful in the two storage magnetic arrays, the second storage magnetic array write data fails; the recording subunit, setting And: recording the same storage block in the two storage magnetic arrays for storing the first predetermined data; the synchronization subunit is configured to: when the second storage magnetic matrix returns to normal, according to the The same memory block synchronizes the data mirror successfully written by the first storage magnetic array into the second storage magnetic matrix.
可选地,所述写入单元还包括:停止子单元,设置为:在确定所述两个存储磁阵中第一存储磁阵写数据成功,所述第二存储磁阵写数据失败之后,停止向所述第二存储磁阵写入所述第一预定数据。Optionally, the writing unit further includes: a stopping subunit, configured to: after determining that the first storage magnetic array write data is successful in the two storage magnetic arrays, and the second storage magnetic array fails to write data, Stop writing the first predetermined data to the second storage magnetic matrix.
可选地,所述记录子单元通过如下方式记录用于存储所述第一预定数据的所述两个存储磁阵中相同的存储块:利用位图bitmap标识的方式记录所述存储块,和/或,通过日志记录的方式记录所述存储块。Optionally, the recording subunit records the same storage block in the two storage magnetic arrays for storing the first predetermined data by recording the storage block by means of a bitmap bitmap identification, and / or, the storage block is recorded by means of logging.
可选地,所述读写模块包括:第二接收单元,设置为:接收用于请求读取第二预定数据的数据读请求;第三接收单元,设置为:接收所述两个以上的存储磁阵中的第三存储磁阵返回的所述第二预定数据中的部分数据和所述两个以上的存储磁阵中的第四存储磁阵返回的所述第二预定数据中除所述部分数据外的数据。Optionally, the read/write module includes: a second receiving unit, configured to: receive a data read request for requesting to read the second predetermined data; and the third receiving unit is configured to: receive the two or more storages Except for the partial data in the second predetermined data returned by the third storage magnetic matrix in the magnetic array and the second predetermined data returned by the fourth storage magnetic array in the two or more storage magnetic arrays Data outside of some data.
可选地,所述确定模块包括:创建单元,设置为:创建两个以上新的存储磁阵;或者,升级单元,设置为:将已存在的预定存储磁阵升级为双活存储磁阵,其中,所述双活存储磁阵包括所述预定存储磁阵和复制所述预定存储磁阵后得到的复制存储磁阵。Optionally, the determining module includes: a creating unit, configured to: create two or more new storage magnetic arrays; or, an upgrading unit, configured to: upgrade the existing predetermined storage magnetic array to a dual-active storage magnetic array, The active-active storage magnetic array includes the predetermined storage magnetic array and a duplicate storage magnetic array obtained after copying the predetermined storage magnetic array.
上述方案采用确定两个以上用于数据读写的存储磁阵;利用所述两个以上的存储磁阵进行数据读写,解决了相关技术中存在的资源浪费以及数据读写效率低的问题,进而达到了避免资源浪费以及提高数据读写效率的效果。The above solution adopts two or more storage magnetic arrays for reading and writing data; using the two or more storage magnetic arrays for data reading and writing, solving the problem of waste of resources and low data reading and writing efficiency in the related art, In turn, the effect of avoiding resource waste and improving data reading and writing efficiency is achieved.
本发明实施例还提供了一种双活存储***,包括相互连通且同时提供业 务服务的两个数据中心,每一数据中心至少包括一个存储磁阵和一个用户服务器,一个数据中心的用户服务器与所述两个数据中心的存储磁阵连接,所述两个数据中心的存储磁阵之间通过后端网络连接;其中:An embodiment of the present invention further provides an active-active storage system, including interconnecting and providing services at the same time. Two data centers of the service, each data center includes at least one storage magnetic array and one user server, and a user server of one data center is connected with the storage magnetic array of the two data centers, and the storage of the two data centers The magnetic arrays are connected through a backend network; where:
所述用户服务器,设置为:将用户的写请求同时分发给所述两个数据中心的存储磁阵,在主LUN和副LUN均写成功时,返回给用户写成功;The user server is configured to: distribute the write request of the user to the storage magnetic array of the two data centers at the same time, and return to the user for successful writing when both the primary LUN and the secondary LUN are successfully written;
所述存储磁阵,设置为:接收到所述写请求后,进行写数据操作。The storage magnetic array is configured to perform a write data operation after receiving the write request.
可选地,所述两个数据中心位于异地,所述两个数据中心通过光纤和光纤交换机相互连通。Optionally, the two data centers are located off-site, and the two data centers are connected to each other through an optical fiber and a fiber switch.
可选地,所述双活存储***存储两个真实的逻辑单元号LUN,一个是主LUN,对应一个数据中心的存储磁阵,另一个是副LUN,对应另一数据中心的存储磁阵,对用户呈现所述主LUN。Optionally, the dual-active storage system stores two real logical unit number LUNs, one is a primary LUN, corresponding to a storage magnetic array of one data center, and the other is a secondary LUN corresponding to a storage magnetic array of another data center. Present the primary LUN to the user.
可选地,所述存储磁阵包括双控存储控制器和磁盘,所述两个数据中心的存储磁阵中的双控存储控制器组成集群,形成所述双活存储***的控制主体,所述两个数据中心之间通过集群通道进行消息收发。Optionally, the storage magnetic array includes a dual-control storage controller and a magnetic disk, and the dual-control storage controllers in the storage magnetic array of the two data centers form a cluster to form a control body of the dual-active storage system. The two data centers communicate with each other through the cluster channel.
可选地,所述双活存储***还包括以下一种或多种模块:Optionally, the active-active storage system further includes one or more of the following modules:
同步镜像模块,设置为:将主LUN的数据在后台按磁盘块进行数据的镜像,直至数据全部镜像到副LUN;The synchronous mirroring module is configured to mirror the data of the primary LUN in the background by the disk block until the data is mirrored to the secondary LUN.
写入处理模块,设置为:在主LUN和副LUN中一个写成功而另一个写失败时,记录本次写成功的存储块的信息,在写失败的LUN故障恢复后,通过所述同步镜像模块将所述存储块的数据同步到所述写失败的LUN;The write processing module is configured to: when one write succeeds and the other write fails in the primary LUN and the secondary LUN, record the information of the successfully written storage block, and after the failed LUN failure recovery, pass the synchronous mirror The module synchronizes data of the storage block to the LUN that failed to write;
集群决策模块,设置为:当主LUN和副LUN中有一个写成功而另一个失败时,将写失败的LUN置为不可用,在所述写失败的LUN故障恢复且完成数据同步镜像后再恢复所述写失败的LUN的服务;和/或,当多个集群成员同时发生错误时,通过集群事务的方式来达到集群间的操作更改一致;The cluster decision module is configured to: when one of the primary LUN and the secondary LUN is successfully written and the other fails, the LUN that fails to be written is set to be unavailable, and the LUN fails to be restored after the write fails, and the data synchronization mirror is restored. The service of the failed LUN is written; and/or, when multiple cluster members have errors at the same time, the operation change between the clusters is achieved through the cluster transaction manner;
仲裁设备,设置为:在集群通信发生异常后,所述仲裁模块作为仲裁服务器,通过投票决定集群中继续提供服务的成员;或者所述仲裁模块作为IP存储局域网络设备,通过资源争抢使占用多数资源者继续提供服务。 The arbitration device is configured to: after the cluster communication is abnormal, the arbitration module acts as an arbitration server, and determines, by voting, a member that continues to provide services in the cluster; or the arbitration module acts as an IP storage local area network device, and is occupied by resource contention. Most resource providers continue to provide services.
本发明实施例还提供了一种双活存储***的实现方法,包括:The embodiment of the invention further provides an implementation method of a dual-active storage system, including:
构建相互连通且同时提供业务服务的两个数据中心,每一数据中心至少包括一个存储磁阵和一个用户服务器,一个数据中心的用户服务器与所述两个数据中心的存储磁阵连接,所述两个数据中心的存储磁阵之间通过后端网络连接;Build two data centers that are connected to each other and provide service services at the same time, each data center includes at least one storage magnetic array and one user server, and a user server of one data center is connected with a storage magnetic array of the two data centers, The storage magnetic arrays of the two data centers are connected by a back-end network;
所述用户服务器接收到用户的写请求后,将所述写请求同时分发给所述两个数据中心的存储磁阵,在所述两个数据中心的存储磁阵均写成功时,返回给用户写成功;After receiving the write request of the user, the user server simultaneously distributes the write request to the storage magnetic array of the two data centers, and when the storage magnetic arrays of the two data centers are successfully written, the user returns to the user. Successfully written;
所述存储磁阵接收到所述写请求后,进行写数据操作。After receiving the write request, the storage magnetic array performs a write data operation.
可选地,所述方法还包括:Optionally, the method further includes:
在一数据中心的存储磁阵上创建主逻辑单元号LUN,在另一个数据中心的存储磁阵上创建副LUN;或者,将一个数据中心已存在的LUN升级为主LUN,在另一数据中心的存储磁阵创建副LUN,将所述主LUN的数据同步镜像到副LUN成功后,副LUN再对外提供服务Create a primary logical unit number LUN on a storage magnetic array in a data center, and create a secondary LUN on a storage magnetic array in another data center; or upgrade a existing LUN of a data center to a primary LUN in another data center. The storage LUN creates a secondary LUN. After the data of the primary LUN is mirrored to the secondary LUN successfully, the secondary LUN is served.
所述双活存储***对用户呈现所述主LUN。The active-active storage system presents the primary LUN to a user.
可选地,所述方法还包括:所述主LUN和副LUN接收到写请求并进行写数据操作后,如果其中一个写成功而另一个写失败时,记录本次写成功的存储块的信息并将写失败的LUN置为不可用,在写失败的LUN故障恢复后,将所述存储块的数据同步到所述写失败的LUN,完成数据同步镜像后再恢复所述写失败的LUN的服务。Optionally, the method further includes: after the primary LUN and the secondary LUN receive the write request and perform a data write operation, if one of the writes succeeds and the other write fails, the information of the successfully written storage block is recorded. And the LUN that fails to be written is set to be unavailable. After the failed LUN failure is restored, the data of the storage block is synchronized to the LUN that fails to be written, and the LUN that fails to be written is restored after the data synchronization mirroring is completed. service.
可选地,所述存储磁阵包括双控存储控制器和磁盘,所述两个数据中心的存储磁阵中的双控存储控制器组成集群,形成所述双活存储***的控制主体,所述两个数据中心之间通过集群通道进行消息收发;Optionally, the storage magnetic array includes a dual-control storage controller and a magnetic disk, and the dual-control storage controllers in the storage magnetic array of the two data centers form a cluster to form a control body of the dual-active storage system. Transmitting and sending messages between the two data centers through the cluster channel;
所述方法还包括以下一种或多种处理:The method also includes one or more of the following:
设置集群决策模块,当多个集群成员同时发生错误时,通过集群事务的方式来达到集群间的操作更改一致;Set the cluster decision module. When multiple cluster members have errors at the same time, the cluster operations are used to achieve the same operation change between clusters.
设置仲裁模块,在集群通信异常后,通过投票决定集群中继续提供服务 的成员;或者通过资源争抢使占用多数资源者继续提供服务。Set the arbitration module to determine the continued service in the cluster by voting after the cluster communication is abnormal. Members; or through resource scrambles to enable the majority of the resources to continue to provide services.
上述双活存储***及其实现方法中,两个数据中心可以同时提供业务服务,效率得到较大提升。In the above dual-active storage system and its implementation method, two data centers can simultaneously provide service services, and the efficiency is greatly improved.
本发明实施例的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本发明实施例而了解。本发明实施例的目的和其他优点可通过在说明书、权利要求书以及附图中所特别指出的结构来实现和获得。Other features and advantages of the embodiments of the invention will be set forth in the description in the description in the claims The objectives and other advantages of the embodiments of the present invention can be realized and obtained by the structure of the invention.
在阅读并理解了附图和详细描述后,可以明白其他方面。Other aspects will be apparent upon reading and understanding the drawings and detailed description.
附图概述BRIEF abstract
图1是根据本发明实施例一的数据读写方法的流程图;1 is a flowchart of a data reading and writing method according to Embodiment 1 of the present invention;
图2是根据本发明实施例一的数据读写装置的结构框图;2 is a block diagram showing the structure of a data reading and writing apparatus according to a first embodiment of the present invention;
图3是根据本发明实施例一的数据读写装置中读写模块24的结构框图一;3 is a block diagram 1 of a structure of a read/write module 24 in a data read/write device according to a first embodiment of the present invention;
图4是根据本发明实施例一的数据读写装置中写入单元34的结构框图;4 is a block diagram showing the structure of a writing unit 34 in a data reading and writing device according to a first embodiment of the present invention;
图5是根据本发明实施例一的数据读写装置中写入单元34的优选结构框图;FIG. 5 is a block diagram showing a preferred structure of the writing unit 34 in the data reading and writing apparatus according to the first embodiment of the present invention;
图6是根据本发明实施例一的数据读写装置中读写模块24的结构框图二;6 is a block diagram 2 of a structure of a read/write module 24 in a data read/write device according to a first embodiment of the present invention;
图7是根据本发明实施例一的数据读写装置中确定模块22的结构框图;FIG. 7 is a block diagram showing the structure of the determining module 22 in the data reading and writing apparatus according to the first embodiment of the present invention;
图8是根据本发明实施例二的双活存储***的结构框图;8 is a structural block diagram of an active-active storage system according to a second embodiment of the present invention;
图9是根据本发明实施例二双活存储***中LUN的创建,双活数据分发以及异常处理的流程图。FIG. 9 is a flowchart of creating a LUN, dual-active data distribution, and exception processing in a dual-active storage system according to an embodiment of the present invention.
本发明的较佳实施方式Preferred embodiment of the invention
下面结合附图对本发明的实施方式进行描述。需要说明的是,在不冲突 的情况下,本申请中的实施例及实施例中的各种方式可以相互组合。Embodiments of the present invention will be described below with reference to the accompanying drawings. It should be noted that there is no conflict In the case, the embodiments of the present application and the various aspects of the embodiments may be combined with each other.
需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。It is to be understood that the terms "first", "second" and the like in the specification and claims of the present invention are used to distinguish similar objects, and are not necessarily used to describe a particular order or order.
实施例一Embodiment 1
在本实施例中提供了一种数据读写方法,图1是根据本发明实施例的数据读写方法的流程图,如图1所示,该流程包括如下步骤:In this embodiment, a data reading and writing method is provided. FIG. 1 is a flowchart of a data reading and writing method according to an embodiment of the present invention. As shown in FIG. 1 , the process includes the following steps:
步骤S102,确定两个以上用于数据读写的存储磁阵;Step S102, determining two or more storage magnetic arrays for reading and writing data;
步骤S104,利用该两个以上的存储磁阵进行数据读写。In step S104, data is read and written by using the two or more storage magnetic arrays.
通过上述方法,利用两个以上存储磁阵进行数据读写,可以在保证各存储磁阵互为备份的前提下,提高各存储磁阵的资源效率,并且,利用多个存储磁阵提供数据读写服务还可以提高数据读写的效率,从而解决了相关技术中存在的资源浪费以及数据读写效率低的问题,进而达到了避免资源浪费以及提高数据读写效率的效果。Through the above method, using two or more storage magnetic arrays for data reading and writing, the resource efficiency of each storage magnetic array can be improved under the premise of ensuring that each storage magnetic array is backed up, and data storage is provided by using multiple storage magnetic arrays. Write service can also improve the efficiency of data reading and writing, thus solving the problem of waste of resources and low efficiency of data reading and writing in related technologies, thereby achieving the effect of avoiding resource waste and improving data reading and writing efficiency.
在一个可选的实施例中,在上述两个以上的存储磁阵为两个存储磁阵的情况下,利用上述两个以上的存储磁阵进行数据读写包括:接收用于请求写入第一预定数据的数据写请求;依据该数据写请求,将第一预定数据写入上述两个存储磁阵。In an optional embodiment, in the case that the two or more storage magnetic arrays are two storage magnetic arrays, data reading and writing by using the two or more storage magnetic arrays includes: receiving for requesting writing a data write request of a predetermined data; according to the data write request, the first predetermined data is written to the two storage magnetic arrays.
在一个可选的实施例中,在将上述第一预定数据写入上述两个存储磁阵包括:确定两个存储磁阵中第一存储磁阵写数据成功,第二存储磁阵写数据失败;记录用于存储该第一预定数据的两个存储磁阵中相同的存储块;在该第二存储磁阵恢复正常的情况下,依据上述相同的存储块,将第一存储磁阵写入成功的数据镜像同步到第二存储磁阵中。通过该实施例,可以保证两个存储磁阵中的数据始终一致,从而可以在一个存储磁阵发生故障后,由另一个存储磁阵提供数据读写服务,保证业务的正常进行。In an optional embodiment, writing the first predetermined data to the two storage magnetic arrays comprises: determining that the first storage magnetic array write data is successful in the two storage magnetic arrays, and the second storage magnetic array writing data fails Recording the same storage block in the two storage magnetic arrays for storing the first predetermined data; in the case where the second storage magnetic matrix returns to normal, writing the first storage magnetic matrix according to the same storage block as described above Successful data mirroring is synchronized to the second storage magnetic array. With this embodiment, it is ensured that the data in the two storage magnetic arrays are always consistent, so that after one storage magnetic array fails, another storage magnetic array provides data read and write services to ensure normal operation of the service.
在一个可选的实施例中,在确定上述两个存储磁阵中第一存储磁阵写数据成功,第二存储磁阵写数据失败之后,还包括:停止向第二存储磁阵写入 第一预定数据。即,当第二存储磁阵中无法成功写入数据后,说明该第二存储磁阵可能出现了故障,或者可能是当前用于写入数据的磁盘出现了故障,而导致第二存储磁阵无法提供正常的数据读写服务,此时可以将该第二存储磁阵置于不可用状态,等第二存储磁阵恢复正常后,再利用该第二存储磁阵进行数据读写服务。从而有效保证了数据读写的正确率。In an optional embodiment, after determining that the first storage magnetic array write data is successful in the two storage magnetic arrays, and the second storage magnetic array fails to write data, the method further includes: stopping writing to the second storage magnetic array First predetermined data. That is, after the data cannot be successfully written in the second storage magnetic matrix, it may indicate that the second storage magnetic matrix may be faulty, or the disk currently used for writing data may be faulty, resulting in the second storage magnetic matrix. The normal storage data read/write service cannot be provided. At this time, the second storage magnetic array can be placed in an unavailable state, and after the second storage magnetic array returns to normal, the second storage magnetic array is used to perform data read and write services. Thereby effectively ensuring the correct rate of data reading and writing.
在一个可选的实施例中,记录用于存储上述第一预定数据的两个存储磁阵中相同的存储块包括:利用位图bitmap标识的方式记录上述存储块,和/或,通过日志记录的方式记录上述存储块。当然,上述的存储方式仅是两种示例,还可以采用其他的方式记录,在此,不一一列举。In an optional embodiment, recording the same storage block in the two storage magnetic arrays for storing the first predetermined data includes: recording the storage block by means of bitmap bitmap identification, and/or, by logging The way to record the above memory block. Of course, the foregoing storage methods are only two examples, and may be recorded in other manners, and are not enumerated here.
在一个可选的实施例中,利用上述两个以上的存储磁阵进行数据读写包括:接收用于请求读取第二预定数据的数据读请求;接收上述两个以上的存储磁阵中的第三存储磁阵返回的上述第二预定数据中的部分数据和上述两个以上的存储磁阵中的第四存储磁阵返回的上述第二预定数据中除上述部分数据外的数据。其中,上述的第三存储磁阵可以是一个或多个存储磁阵,上述的第四存储磁阵也可以是一个或多个存储磁阵,通过上述的第三存储磁阵和第四存储磁阵同时提供数据读服务,可以提高数据读取效率,节省数据读取时间。In an optional embodiment, performing data reading and writing by using the two or more storage magnetic arrays includes: receiving a data read request for requesting to read the second predetermined data; and receiving the two or more storage magnetic arrays. And a portion of the second predetermined data returned by the third storage magnetic matrix and the second predetermined data returned by the fourth storage magnetic matrix of the two or more storage magnetic arrays other than the partial data. The third storage magnetic array may be one or more storage magnetic arrays, and the fourth storage magnetic array may also be one or more storage magnetic arrays, through the third storage magnetic array and the fourth storage magnetic layer. The array also provides data reading services, which can improve data reading efficiency and save data reading time.
上述的确定存储磁阵的方式有多种,在一个可选的实施例中,确定两个以上用于数据读写的存储磁阵包括:创建两个以上新的存储磁阵;或者,将已存在的预定存储磁阵升级为双活存储磁阵,其中,该双活存储磁阵包括上述预定存储磁阵和复制该预定存储磁阵后得到的复制存储磁阵。The foregoing methods for determining a storage magnetic array are various. In an optional embodiment, determining two or more storage magnetic arrays for reading and writing data includes: creating two or more new storage magnetic arrays; or The existing predetermined storage magnetic array is upgraded to an active-active storage magnetic array, wherein the active-active storage magnetic array includes the predetermined storage magnetic array described above and a replicated storage magnetic array obtained by replicating the predetermined storage magnetic array.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明各实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行各个实施例所述的方法。 Through the description of the above embodiments, those skilled in the art can clearly understand that the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware, but in many cases, the former is A better implementation. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in the form of software products in essence or in the form of a software product stored in a storage medium (such as ROM/RAM, The disk, the optical disk, includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods described in the various embodiments.
在本实施例中还提供了一种数据读写装置,该装置用于实现上述实施例及优选实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。In the embodiment, a data reading and writing device is also provided, which is used to implement the above-mentioned embodiments and preferred embodiments, and has not been described again. As used below, the term "module" may implement a combination of software and/or hardware of a predetermined function. Although the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
图2是根据本发明实施例的数据读写装置的结构框图,如图2所示,该装置包括确定模块22和读写模块24,下面对该装置进行说明。2 is a block diagram showing the structure of a data read/write device according to an embodiment of the present invention. As shown in FIG. 2, the device includes a determination module 22 and a read/write module 24, which will be described below.
确定模块22,设置为:确定两个以上用于数据读写的存储磁阵;The determining module 22 is configured to: determine two or more storage magnetic arrays for reading and writing data;
读写模块24,连接至上述确定模块22,设置为:利用上述两个以上的存储磁阵进行数据读写。The read/write module 24 is connected to the determining module 22, and is configured to perform data reading and writing by using the two or more storage magnetic arrays.
图3是根据本发明实施例的数据读写装置中读写模块24的结构框图一,如图3所示,该读写模块24包括第一接收单元32和写入单元34,下面对该读写模块24进行说明。3 is a block diagram showing the structure of a read/write module 24 in a data read/write device according to an embodiment of the present invention. As shown in FIG. 3, the read/write module 24 includes a first receiving unit 32 and a writing unit 34. The read/write module 24 will be described.
第一接收单元32,设置为:在上述两个以上的存储磁阵为两个存储磁阵的情况下,接收用于请求写入第一预定数据的数据写请求;The first receiving unit 32 is configured to: when the two or more storage magnetic arrays are two storage magnetic arrays, receive a data write request for requesting to write the first predetermined data;
写入单元34,连接至上述第一接收单元32,设置为:依据上述数据写请求,将第一预定数据写入两个存储磁阵。The writing unit 34 is connected to the first receiving unit 32, and is configured to write the first predetermined data into the two storage magnetic arrays according to the data writing request.
图4是根据本发明实施例的数据读写装置中写入单元34的结构框图,如图4所示,该写入单元34包括确定子单元42、记录子单元44和同步子单元46,下面对该写入单元34进行说明。4 is a block diagram showing the structure of a write unit 34 in a data read/write apparatus according to an embodiment of the present invention. As shown in FIG. 4, the write unit 34 includes a determination subunit 42, a recording subunit 44, and a sync subunit 46. The writing unit 34 will be described.
确定子单元42,设置为:确定两个存储磁阵中第一存储磁阵写数据成功,第二存储磁阵写数据失败;The determining subunit 42 is configured to: determine that the first storage magnetic array in the two storage magnetic arrays successfully writes data, and the second storage magnetic array fails to write data;
记录子单元44,连接至上述确定子单元42,设置为:记录用于存储上述第一预定数据的两个存储磁阵中相同的存储块;a recording subunit 44, coupled to the determining subunit 42, configured to: record the same one of the two storage magnetic arrays for storing the first predetermined data;
同步子单元46,连接至上述记录子单元44,设置为:在上述第二存储磁阵恢复正常的情况下,依据上述相同的存储块,将第一存储磁阵写入成功的数据镜像同步到上述第二存储磁阵中。The synchronization subunit 46 is connected to the recording subunit 44, and is configured to synchronize the data mirror successfully written by the first storage magnetic array according to the same storage block when the second storage magnetic matrix is restored to the normal state. In the above second storage magnetic array.
图5是根据本发明实施例的数据读写装置中写入单元34的优选结构框 图,如图5所示,该写入单元34除包括图4所示的单元外,还包括停止子单元52,下面对该写入单元34进行说明。FIG. 5 is a block diagram of a preferred structure of the writing unit 34 in the data reading and writing device according to an embodiment of the present invention. As shown in FIG. 5, the writing unit 34 includes a stop subunit 52 in addition to the unit shown in FIG. 4, and the write unit 34 will be described below.
停止子单元52,连接至上述确定子单元42,设置为:在确定上述两个存储磁阵中第一存储磁阵写数据成功,第二存储磁阵写数据失败之后,停止向第二存储磁阵写入上述第一预定数据。The stop subunit 52 is connected to the determining subunit 42 to be configured to: after determining that the first storage magnetic array write data is successful in the two storage magnetic arrays, and the second storage magnetic array write data fails, stopping to the second storage magnetic field The array writes the first predetermined data described above.
在一个可选的实施例中,上述记录子单元44可以通过如下方式记录用于存储第一预定数据的两个存储磁阵中相同的存储块:利用位图bitmap标识的方式记录上述存储块,和/或,通过日志记录的方式记录上述存储块。In an optional embodiment, the recording sub-unit 44 may record the same storage block in the two storage magnetic arrays for storing the first predetermined data by recording the storage block by means of a bitmap bitmap identification. And/or, the above storage block is recorded by means of logging.
图6是根据本发明实施例的数据读写装置中读写模块24的结构框图二,如图6所示,该读写模块24包括第二接收单元62和第三接收单元64,下面对该读写模块24进行说明。FIG. 6 is a block diagram showing the structure of the read/write module 24 in the data read/write device according to the embodiment of the present invention. As shown in FIG. 6, the read/write module 24 includes a second receiving unit 62 and a third receiving unit 64. The read/write module 24 will be described.
第二接收单元62,设置为:接收用于请求读取第二预定数据的数据读请求;The second receiving unit 62 is configured to: receive a data read request for requesting to read the second predetermined data;
第三接收单元64,连接至上述第二接收单元62,设置为:接收上述两个以上的存储磁阵中的第三存储磁阵返回的第二预定数据中的部分数据和上述两个以上的存储磁阵中的第四存储磁阵返回的第二预定数据中除上述部分数据外的数据。The third receiving unit 64 is connected to the second receiving unit 62, and configured to: receive partial data in the second predetermined data returned by the third storage magnetic array of the two or more storage magnetic arrays, and the two or more And storing data other than the partial data in the second predetermined data returned by the fourth storage magnetic matrix in the magnetic array.
图7是根据本发明实施例的数据读写装置中确定模块22的结构框图,如图7所示,该确定模块22包括创建单元72或者升级单元74,下面对该确定模块22进行说明。FIG. 7 is a block diagram showing the structure of the determining module 22 in the data reading and writing apparatus according to the embodiment of the present invention. As shown in FIG. 7, the determining module 22 includes a creating unit 72 or an upgrading unit 74. The determining module 22 will be described below.
创建单元72,设置为:创建两个以上新的存储磁阵;The creating unit 72 is configured to: create more than two new storage magnetic arrays;
升级单元74,设置为:将已存在的预定存储磁阵升级为双活存储磁阵,其中,该双活存储磁阵包括上述预定存储磁阵和复制该预定存储磁阵后得到的复制存储磁阵。The upgrading unit 74 is configured to: upgrade the existing predetermined storage magnetic array to an active-active storage magnetic array, wherein the dual-active storage magnetic array comprises the predetermined storage magnetic array and the duplicate storage magnetic magnetic material obtained after copying the predetermined storage magnetic array Array.
实施例二Embodiment 2
本发明实施例中提出了一种双活存储***,包括两个数据中心(每个数据中心中均包括一个存储磁阵),可以同时提供业务服务,由于同时可以接 入,通过负载均衡机制效率得到较大提升,通过同步镜像技术,数据中心数据完全实时一致,并且站点资源可以互相接管,无论是出现单点磁阵控制器故障还是磁盘故障都不会对用户业务造成中断。In the embodiment of the present invention, an active-active storage system is provided, including two data centers (each of which includes a storage magnetic array), which can simultaneously provide service services, and can be connected at the same time. The efficiency of the load balancing mechanism is greatly improved. Through the synchronous mirroring technology, the data center data is completely real-time consistent, and the site resources can take over each other. No matter whether the single-point magnetic array controller fault or the disk fault occurs, the user service will not be taken. Caused an interruption.
为达到上述目的,本发明实施例中提供了一种两地双活存储***,该***包括两个数据中心。其中,每个数据中心包括一套存储磁阵以及光纤交换机若干。存储磁阵包括双控存储控制器和磁盘。其中双控存储控制器是两个高可用可互相接管的单板组成,每个单板为一个节点。两个数据中心可以使用光纤连接,并且每个存储磁阵的两个双控存储控制器组成集群,形成双活的控制器主体。为了保证***的可靠性,集群需要一个仲裁设备,以防止集群通信异常后发生脑裂造成的资源争抢问题。下面对该***中的各部分进行说明:To achieve the above objective, an embodiment of the present invention provides a two-site dual-active storage system, where the system includes two data centers. Each data center includes a set of storage magnetic arrays and a number of fiber switches. The storage magnetic array includes dual-control storage controllers and disks. The dual-control storage controller is composed of two boards that are highly available and can be taken over each other. Each board is a node. The two data centers can be connected using fiber optics, and the two dual-control memory controllers of each storage magnetic array form a cluster to form a dual-active controller body. In order to ensure the reliability of the system, the cluster needs an arbitration device to prevent the resource contention caused by brain splitting after the cluster communication is abnormal. The following sections describe the various parts of the system:
存储磁阵,在双活存储***中,是存储用户数据的主体。存储磁阵对用户通过小型计算机***接口(Internet Small Computer System Interface,简称为iscsi)或者光纤通道(Fibre Channel,简称为fc)提供逻辑单元号(Logical Unit Numbe,简称为LUN)访问服务。两地存储磁阵中的双控存储控制器构成一个集群。存储控制器用于双活的核心控制设备,负责双活集群管理,以及卷管理,以及同步镜像实现。The storage magnetic array, in the active-active storage system, is the main body for storing user data. The storage magnetic array provides a logical unit number (Logical Unit Numbe, LUN for short) access service through a small computer system interface (iscsi) or a Fibre Channel (fc). The dual-control memory controllers in the two storage magnetic arrays form a cluster. The storage controller is used for dual-active core control devices, responsible for active-active cluster management, as well as volume management, and synchronous mirroring.
光纤交换机,两地的双控存储控制器的用户接入前端以及内部数据传输的后端网络的连接都使用光纤,以达到低延时目的。The fiber switch, the user access front end of the dual-control storage controller of the two places and the back-end network connection of the internal data transmission use optical fibers to achieve low delay.
仲裁设备,通常仲裁设备放在第三地点,可以是一个仲裁服务器来进行投票决定集群中的成员哪些可以继续提供服务。也可以是一个以IP网络构建的存储网络IP SAN(Storage Area Network:存储局域网络)设备,通过资源争抢,占用多数资源者继续提供服务。Arbitration equipment, usually the arbitration equipment is placed in a third location, which can be an arbitration server to vote to determine which members of the cluster can continue to provide services. It can also be a storage network IP SAN (Storage Area Network) device built on an IP network. Through resource competition, those who occupy most resources continue to provide services.
在本发明实施例中提供了一种两地双活存储***及实现方法,其中,***中存储两个真实的LUN,而对用户呈现则只能看到一个双活的LUN。这两个真实LUN,其中一个是主LUN,另外一个是副LUN,这两个真实LUN分别由两地的存储磁阵各提供一个。用户将已存在的主LUN变成双活时,副LUN是在另一数据中心创建出来的。用户只能看到双活LUN即主LUN。或者用户创建一个全新的双活的LUN,会自动创建一个主LUN作为双活LUN, 并在另一数据中心自动创建一个副LUN。副LUN只是双活LUN的一部分,不对外单独提供服务。In the embodiment of the present invention, a two-site dual-active storage system and an implementation method are provided, wherein two real LUNs are stored in the system, and only one active LUN is visible to the user. The two real LUNs, one of which is the primary LUN, and the other is the secondary LUN. The two real LUNs are each provided by one of the two storage magnetic arrays. When a user turns an existing primary LUN into a live-active, the secondary LUN is created in another data center. The user can only see the active LUN, that is, the primary LUN. Or the user creates a new dual-active LUN, and automatically creates a primary LUN as a dual-active LUN. And automatically create a secondary LUN in another data center. The secondary LUN is only part of the active LUN and does not provide separate services.
当有了由主副LUN组成的双活LUN之后,同步镜像模块进行将主LUN的数据在后台按磁盘块进行数据的镜像,直至数据全部镜像到副LUN。在此期间,副LUN不能提供服务,当数据全部镜像到副LUN之后,***将标识副LUN也可以提供访问。后续的用户写操作,将同时进行分发到主副LUN。当主副LUN都写成功,才能给用户反馈操作结果,以达到两端数据一致的目的。特别的,两个数据中心的存储磁阵都可以提供同一个LUN的用户访问服务,以提高***利用率,提升IOPS。用户服务器接收到用户的写请求时,进行主副LUN的同时分发,也即同时分发给所述两个数据中心的存储磁阵,所述存储磁阵接收到所述写请求后进行写数据操作,并且只有两个数据中心的存储磁阵的写数据操作都成功,所述用户服务器才会返回用户成功,保证主副LUN的数据实时一致。After a dual-active LUN consists of the primary and secondary LUNs, the synchronous mirroring module mirrors the data of the primary LUN in the background by the disk block until the data is mirrored to the secondary LUN. During this period, the secondary LUN cannot provide services. After all the data is mirrored to the secondary LUN, the system will identify the secondary LUN and provide access. Subsequent user write operations will be distributed to the primary and secondary LUNs at the same time. When the primary and secondary LUNs are successfully written, the operation result can be fed back to the user to achieve the purpose of consistent data at both ends. In particular, the storage magnetic arrays of both data centers can provide user access services for the same LUN to improve system utilization and improve IOPS. When receiving the write request of the user, the user server performs simultaneous distribution of the primary and secondary LUNs, that is, simultaneously distributes to the storage magnetic arrays of the two data centers, and the storage magnetic array performs the write data operation after receiving the write request. The write data operation of the storage magnetic array of only two data centers is successful, and the user server returns the user success to ensure that the data of the primary and secondary LUNs are consistent in real time.
当用户写主副LUN发生异常后,例如写副LUN失败,则用户的写当次写操作就会记录下来,可以使用bitmap来标识当前写的是哪个块,也可以将此操作记录日志。并标识副LUN不可用,避免读取副LUN造成数据错误。后续异常排除后,***触发进行副LUN数据的恢复,同步镜像模块就开始重放bitmap,将bitmap中标识的被更改的块从主LUN中读取出来并写到副LUN中,最终主副LUN的数据会达到一致,副LUN开始提供服务。如果是采用记录日志的方式,同样的将日志重放完后,主副LUN的数据达到一致,副LUN开始提供服务。After the user writes the primary and secondary LUNs abnormally, for example, if the secondary LUN fails to be written, the user's write write operation will be recorded. You can use the bitmap to identify which block is currently written, or log the operation. And identify that the secondary LUN is unavailable, to avoid data errors caused by reading the secondary LUN. After the subsequent exception is removed, the system triggers the recovery of the secondary LUN data. The synchronous mirroring module starts to replay the bitmap, and the changed block identified in the bitmap is read from the primary LUN and written to the secondary LUN. The data will be consistent and the secondary LUN will begin to provide services. If the log is used, the data of the primary and secondary LUNs are consistent and the secondary LUN starts to provide services.
当用户写操作发生异常后,会有集群决策模块(同上述的写入单元34)来决策本次写操作的操作结果以及更改主副LUN的可用状态,以避免集群多节点间的得到的信息不能严格实时一致的问题。集群决策模块可以通过集群事务的方式,来达到集群间的操作更改一致,当多个集群成员同时发生错误时,通过这种机制可以有效做到错误通知顺序被集群决策模块执行,各个错误的处理时,都能及时得到正确的信息。After an abnormality occurs in the user write operation, a cluster decision module (same as the write unit 34 described above) determines the operation result of the write operation and changes the available state of the primary and secondary LUNs to avoid the obtained information between the clusters and the multiple nodes. Problems that cannot be strictly consistent in real time. The cluster decision module can achieve the same operation change between clusters through cluster transactions. When multiple cluster members have errors at the same time, this mechanism can effectively prevent the error notification sequence from being executed by the cluster decision module. When you get the right information in time.
为避免集群通道网络异常引起的脑裂问题,还需要一个第三方仲裁设备,可以是一个IP SAN设备,通过争抢此设备的方式决定可以继续提供服务的节 点,也可以是一个仲裁服务器进行投票决定继续提供服务的节点。In order to avoid the cracking caused by the abnormality of the cluster channel network, a third-party arbitration device, which can be an IP SAN device, decides the section that can continue to provide services by competing for the device. Point, it can also be a node that the arbitration server votes to continue to provide services.
图8是根据本发明实施例的双活***的整体结构框图,如图8所示,整个双活***时部署在两个数据中心,分别是数据中心A和数据中心B。每个数据中心包括若干实体,例如数据中心A中,包含用户A工作的用户服务器A(同上述的第一接收单元32和第二接收单元62、第三接收单元64)。用户服务器A通过光纤以及FC交换机连接到存储磁阵A上。存储磁阵A包括双控存储控制器A(同上述的写入单元34),双控存储控制器A连接磁盘A,磁盘A是指一块或多块磁盘的总体。FIG. 8 is a block diagram showing the overall structure of an active-active system according to an embodiment of the present invention. As shown in FIG. 8, the entire active-active system is deployed in two data centers, namely, data center A and data center B. Each data center includes a number of entities, such as data center A, including user server A (which is the same as the first receiving unit 32 and the second receiving unit 62 and the third receiving unit 64 described above). User server A is connected to storage magnetic array A through optical fibers and FC switches. The storage magnetic array A includes a dual-control storage controller A (same as the write unit 34 described above), the dual-control storage controller A is connected to the disk A, and the disk A refers to the entirety of one or more disks.
如图8所示,两个数据中心的用户服务器A和用户服务器B通过FC交换机相互连通,并且数据中心A和数据中心B的双控存储控制器和用户的接入网络通过FC交换机相互连通。数据中心的两个双控存储控制器的后端网络用于内部数据的传送,如本申请中的同步镜像通道,也使用FC交换机连通。两个数据中心需要有集群通道进行集群消息收发以及心跳检测。另外,集群间还需要一个第三方仲裁,防止集群通道发生异常后产生的脑裂问题。As shown in FIG. 8, the user server A and the user server B of the two data centers communicate with each other through the FC switch, and the dual-control storage controllers of the data center A and the data center B and the access network of the user communicate with each other through the FC switch. The back-end network of the two dual-control storage controllers in the data center is used for the transmission of internal data, such as the synchronous mirror channel in this application, and is also connected using FC switches. Two data centers need to have cluster channels for cluster messaging and heartbeat detection. In addition, a third-party arbitration is required between clusters to prevent brain splitting problems caused by abnormalities in the cluster channel.
图9是根据本发明实施例的双活LUN的创建,双活数据分发以及异常处理的流程图,如图9所示,该流程包括如下步骤:FIG. 9 is a flowchart of creating, dual-active data distribution, and exception processing of a dual-active LUN according to an embodiment of the present invention. As shown in FIG. 9, the process includes the following steps:
步骤S901:用户发起创建一个双活LUN或将一个LUN升级为双活LUN。Step S901: The user initiates the creation of a dual active LUN or upgrades one LUN to a dual active LUN.
本发明实施例中双活LUN包括两个对象:主LUN和副LUN。对用户呈现也即用户看到的双活LUN就是主LUN。副LUN不对外呈现。In the embodiment of the present invention, the active LUN includes two objects: a primary LUN and a secondary LUN. The active LUN that is presented to the user, that is, the user sees is the primary LUN. The secondary LUN is not presented externally.
步骤S902:如果是创建一个双活LUN,则向两地的存储磁阵分别发起新建主LUN和副LUN的请求。如果是将LUN升级为双活,则是将该LUN变成主LUN,并向另一个数据中心的存储磁阵发起创建副LUN的请求。对用户呈现的是主LUN。Step S902: If a dual active LUN is created, a request for creating a primary LUN and a secondary LUN is initiated to the storage magnetic arrays of the two locations. If the LUN is upgraded to dual-active, the LUN becomes the primary LUN and a request to create a secondary LUN is initiated to the storage magnetic array of another data center. The primary LUN is presented to the user.
步骤S903:在后台,同步镜像模块将主LUN的数据全部复制到副LUN上,此时主副LUN数据完全一致,副LUN开始提供访问。Step S903: In the background, the synchronous mirroring module copies all the data of the primary LUN to the secondary LUN. At this time, the data of the primary and secondary LUNs are completely consistent, and the secondary LUN starts to provide access.
双活LUN创建后,如果是新创建的双活LUN,则说明主副LUN没有数据,都可以立即提供访问。如果是将LUN升级为双活,则主LUN可能会有数据,只能通过后台将主LUN的数据都镜像到副LUN上,等数据完全一致 后,副LUN可以提供访问。After a dual-active LUN is created, if it is a newly created active-active LUN, the primary and secondary LUNs have no data and can be accessed immediately. If the LUN is upgraded to dual-active, the primary LUN may have data. Only the data of the primary LUN can be mirrored to the secondary LUN in the background. After that, the secondary LUN can provide access.
步骤S904:双活存储***的各个节点都可以提供服务,并且写IO会同时分发给主副LUN。Step S904: Each node of the active-active storage system can provide services, and the write IO is simultaneously distributed to the primary and secondary LUNs.
经过步骤S903后,用户的写请求的IO会同时分发给主副LUN,以达到主副LUN数据完全一致的效果。After the step S903, the IO of the user's write request is simultaneously distributed to the primary and secondary LUNs, so that the data of the primary and secondary LUNs is completely consistent.
步骤S905:返回给用户写成功,继续等待用户的读写请求。Step S905: returning to the user for successful writing, and continuing to wait for the user's read and write request.
如果步骤S904中写IO同时分发给主副LUN后,如果主副LUN都返回写成功,则返回给用户写成功,继续等待用户的读写请求。这样就保证主副LUN数据是实时一致。下一次用户IO请求则跟步骤S904一样,如果用户IO主副LUN都写成功,则整个双活***健康的持续下去。If the write IO is simultaneously distributed to the primary and secondary LUNs in step S904, if the primary and secondary LUNs all return a successful write, the user returns a successful write and continues to wait for the user's read and write requests. This ensures that the primary and secondary LUN data is consistent in real time. The next user IO request is the same as step S904. If the user IO primary and secondary LUNs are successfully written, the entire active-active system continues to be healthy.
步骤S906:如果都写失败,返回用户失败,如果有一个成功,则通知集群决策模块,集群决策模块将写失败的LUN置为不可用。例如,副LUN写失败,置副LUN不可用。Step S906: If the write fails, the return user fails. If there is a success, the cluster decision module is notified, and the cluster decision module sets the failed LUN to be unavailable. For example, the secondary LUN fails to write and the secondary LUN is unavailable.
如果主LUN失败后,可将其设置为不可用,并要将主LUN和副LUN进行转换,保证主LUN继续提供服务。If the primary LUN fails, you can set it to be unavailable and convert the primary LUN and the secondary LUN to ensure that the primary LUN continues to provide services.
当用户写操作发生异常后,需要通知双活***的集群决策模块,集群决策模块将写失败的LUN置为不可用。集群决策模块可以通过集群事务的方式,来达到集群间的操作更改一致,当多个集群成员同时发生错误时,通过这种机制可以有效做到错误通知顺序被集群决策模块执行,各个错误的处理时,都能及时得到正确的信息。After the user writes an abnormality, the cluster decision module of the active-active system needs to be notified, and the cluster decision module sets the failed LUN to be unavailable. The cluster decision module can achieve the same operation change between clusters through cluster transactions. When multiple cluster members have errors at the same time, this mechanism can effectively prevent the error notification sequence from being executed by the cluster decision module. When you get the right information in time.
步骤S907:***用bitmap来标识本次更改的数据块,或者用日志记录本次更改的信息。Step S907: The system uses the bitmap to identify the data block of the current change, or records the information of the current change by using a log.
用户的写当次写操作就会记录下来,例如写副LUN失败,则可以使用bitmap来标识当前写的是哪个块,也可以将此操作记录日志。为后续副LUN的数据恢复做基础。The user's write write operation will be recorded. For example, if the write secondary LUN fails, the bitmap can be used to identify which block is currently written, and the operation can be logged. It is used as the basis for data recovery of subsequent sub-LUNs.
步骤S908:后续副LUN的写入可以恢复后,***更加失败时的记录信息进行回放,将主LUN中的数据镜像到副LUN,直至全部完成。副LUN可以继续提供服务。 Step S908: After the write of the secondary LUN is restored, the recorded information of the system fails to be played back, and the data in the primary LUN is mirrored to the secondary LUN until all is completed. The secondary LUN can continue to provide services.
经过S908步骤后,双活***的主副LUN又可以继续提供服务,下次用户的写请求IO下发时,就会跟步骤S904一样,无论是S904步骤正常写入还是异常,都会得到正确处理,并最终回到主副LUN都可以提供服务的状态,保证整个***中出现单点故障时,不影响用户的访问。After the S908 step, the primary and secondary LUNs of the active-active system can continue to provide services. The next time the user writes the request IO, it will be the same as step S904. Whether the S904 step is normally written or abnormal, it will be correctly processed. And finally return to the state where the primary and secondary LUNs can provide services, ensuring that a single point of failure in the entire system does not affect the user's access.
同步镜像模块可以设置在主LUN的双控存储控制器中,在创建双活LUN时进行自动进行设置。集群决策模块可以设置在集群各个用户服务器中,构成一个分布式***,在创建双活LUN时自动设置。The synchronous mirroring module can be set in the dual-control storage controller of the primary LUN and automatically set when a dual-active LUN is created. The cluster decision module can be set in each user server of the cluster to form a distributed system, which is automatically set when a dual active LUN is created.
需要说明的是,上述各个模块是可以通过软件或硬件来实现的,对于后者,可以通过以下方式实现,但不限于此:上述模块均位于同一处理器中;或者,上述模块分别位于多个处理器中。It should be noted that each of the above modules may be implemented by software or hardware. For the latter, the foregoing may be implemented by, but not limited to, the foregoing modules are all located in the same processor; or, the modules are located in multiple In the processor.
本发明的实施例还提供了一种存储介质。可选地,在本实施例中,上述存储介质可以被设置为存储用于执行以下步骤的程序代码:Embodiments of the present invention also provide a storage medium. Optionally, in the embodiment, the foregoing storage medium may be configured to store program code for performing the following steps:
S1,确定两个以上用于数据读写的存储磁阵;S1, determining two or more storage magnetic arrays for reading and writing data;
S2,利用该两个以上的存储磁阵进行数据读写。S2, using the two or more storage magnetic arrays for data reading and writing.
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(Read-Only Memory,简称为ROM)、随机存取存储器(Random Access Memory,简称为RAM)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。Optionally, in the embodiment, the foregoing storage medium may include, but is not limited to, a USB flash drive, a Read-Only Memory (ROM), and a Random Access Memory (RAM). A variety of media that can store program code, such as a hard disk, a disk, or an optical disk.
可选地,在本实施例中,处理器根据存储介质中已存储的程序代码执行上述的步骤S1-S2。Optionally, in the embodiment, the processor performs the above steps S1-S2 according to the stored program code in the storage medium.
可选地,本实施例中的具体示例可以参考上述实施例及可选实施方式中所描述的示例,本实施例在此不再赘述。For example, the specific examples in this embodiment may refer to the examples described in the foregoing embodiments and the optional embodiments, and details are not described herein again.
本领域普通技术人员可以理解上述实施例的全部或部分步骤可以使用计算机程序流程来实现,所述计算机程序可以存储于一计算机可读存储介质中,所述计算机程序在相应的硬件平台上(如***、设备、装置、器件等)执行,在执行时,包括方法实施例的步骤之一或其组合。One of ordinary skill in the art will appreciate that all or a portion of the steps of the above-described embodiments can be implemented using a computer program flow, which can be stored in a computer readable storage medium, such as on a corresponding hardware platform (eg, The system, device, device, device, etc. are executed, and when executed, include one or a combination of the steps of the method embodiments.
可选地,上述实施例的全部或部分步骤也可以使用集成电路来实现,这 些步骤可以被分别制作成一个个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。Alternatively, all or part of the steps of the above embodiments may also be implemented using an integrated circuit. The steps may be separately fabricated into individual integrated circuit modules, or a plurality of modules or steps may be fabricated into a single integrated circuit module.
上述实施例中的装置/功能模块/功能单元可以采用通用的计算装置来实现,它们可以集中在单个的计算装置上,也可以分布在多个计算装置所组成的网络上。The devices/function modules/functional units in the above embodiments may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices.
上述实施例中的装置/功能模块/功能单元以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。上述提到的计算机可读取存储介质可以是只读存储器,磁盘或光盘等。When the device/function module/functional unit in the above embodiment is implemented in the form of a software function module and sold or used as a stand-alone product, it can be stored in a computer readable storage medium. The above mentioned computer readable storage medium may be a read only memory, a magnetic disk or an optical disk or the like.
工业实用性Industrial applicability
通过本发明实施例的方案,采用确定两个以上用于数据读写的存储磁阵;利用所述两个以上的存储磁阵进行数据读写,解决了相关技术中存在的资源浪费以及数据读写效率低的问题,进而达到了避免资源浪费以及提高数据读写效率的效果。 According to the solution of the embodiment of the present invention, two or more storage magnetic arrays for reading and writing data are determined; and the data is read and written by using the two or more storage magnetic arrays, thereby solving resource waste and data reading in the related art. The problem of low writing efficiency achieves the effect of avoiding resource waste and improving data reading and writing efficiency.

Claims (23)

  1. 一种数据读写方法,其中,包括:A method of reading and writing data, including:
    确定两个以上用于数据读写的存储磁阵;Determining more than two storage magnetic arrays for reading and writing data;
    利用所述两个以上的存储磁阵进行数据读写。Data reading and writing is performed by using the two or more storage magnetic arrays.
  2. 根据权利要求1所述的方法,其中,在所述两个以上的存储磁阵为两个存储磁阵的情况下,利用所述两个以上的存储磁阵进行数据读写包括:The method of claim 1 wherein, in the case where the two or more storage magnetic arrays are two storage magnetic arrays, data reading and writing using the two or more storage magnetic arrays comprises:
    接收用于请求写入第一预定数据的数据写请求;Receiving a data write request for requesting to write the first predetermined data;
    依据所述数据写请求,将所述第一预定数据写入所述两个存储磁阵。The first predetermined data is written to the two storage magnetic arrays according to the data write request.
  3. 根据权利要求2所述的方法,其中,将所述第一预定数据写入所述两个存储磁阵包括:The method of claim 2 wherein writing the first predetermined data to the two storage magnetic arrays comprises:
    确定所述两个存储磁阵中第一存储磁阵写数据成功,第二存储磁阵写数据失败;Determining that the first storage magnetic array write data is successful in the two storage magnetic arrays, and the second storage magnetic array fails to write data;
    记录用于存储所述第一预定数据的所述两个存储磁阵中相同的存储块;Recording the same one of the two storage magnetic arrays for storing the first predetermined data;
    在所述第二存储磁阵恢复正常的情况下,依据所述相同的存储块,将所述第一存储磁阵写入成功的数据镜像同步到所述第二存储磁阵中。In the case that the second storage magnetic matrix returns to normal, the data mirror successfully written by the first storage magnetic matrix is synchronized into the second storage magnetic matrix according to the same storage block.
  4. 根据权利要求3所述的方法,其中,在确定所述两个存储磁阵中第一存储磁阵写数据成功,所述第二存储磁阵写数据失败之后,还包括:The method according to claim 3, wherein after determining that the first storage magnetic array write data is successful in the two storage magnetic arrays, and the second storage magnetic array fails to write data, the method further comprises:
    停止向所述第二存储磁阵写入所述第一预定数据。Stop writing the first predetermined data to the second storage magnetic matrix.
  5. 根据权利要求3或4所述的方法,其中,记录用于存储所述第一预定数据的所述两个存储磁阵中相同的所述存储块包括:The method according to claim 3 or 4, wherein recording the same one of the two storage magnetic arrays for storing the first predetermined data comprises:
    利用位图bitmap标识的方式记录所述存储块,和/或,通过日志记录的方式记录所述存储块。The memory block is recorded by means of a bitmap bitmap identification, and/or the memory block is recorded by means of log recording.
  6. 根据权利要求1所述的方法,其中,利用所述两个以上的存储磁阵进行数据读写包括:The method of claim 1 wherein the reading and writing of data using the two or more storage magnetic arrays comprises:
    接收用于请求读取第二预定数据的数据读请求;Receiving a data read request for requesting to read the second predetermined data;
    接收所述两个以上的存储磁阵中的第三存储磁阵返回的所述第二预定数 据中的部分数据和所述两个以上的存储磁阵中的第四存储磁阵返回的所述第二预定数据中除所述部分数据外的数据。Receiving the second predetermined number returned by the third of the two or more storage magnetic arrays And data other than the partial data in the second predetermined data returned by the fourth storage magnetic array of the two or more storage magnetic arrays.
  7. 根据权利要求1所述的方法,其中,确定两个以上用于数据读写的存储磁阵包括:The method of claim 1 wherein determining more than two storage magnetic arrays for reading and writing data comprises:
    创建两个以上新的存储磁阵;或者,Create more than two new storage magnetic arrays; or,
    将已存在的预定存储磁阵升级为双活存储磁阵,其中,所述双活存储磁阵包括所述预定存储磁阵和复制所述预定存储磁阵后得到的复制存储磁阵。The existing predetermined storage magnetic array is upgraded to an active-active storage magnetic array, wherein the active-active storage magnetic array includes the predetermined storage magnetic array and a replicated storage magnetic array obtained after copying the predetermined storage magnetic array.
  8. 一种数据读写装置,其中,包括:A data reading and writing device, comprising:
    确定模块,设置为:确定两个以上用于数据读写的存储磁阵;Determining a module, setting: determining two or more storage magnetic arrays for reading and writing data;
    读写模块,设置为:利用所述两个以上的存储磁阵进行数据读写。The read/write module is configured to: read and write data by using the two or more storage magnetic arrays.
  9. 根据权利要求8所述的装置,其中,所述读写模块包括:The apparatus of claim 8 wherein said read and write module comprises:
    第一接收单元,设置为:在所述两个以上的存储磁阵为两个存储磁阵的情况下,接收用于请求写入第一预定数据的数据写请求;a first receiving unit, configured to: receive a data write request for requesting to write the first predetermined data if the two or more storage magnetic arrays are two storage magnetic arrays;
    写入单元,设置为:依据所述数据写请求,将所述第一预定数据写入所述两个存储磁阵。The writing unit is configured to write the first predetermined data into the two storage magnetic arrays according to the data write request.
  10. 根据权利要求9所述的装置,其中,所述写入单元包括:The apparatus of claim 9, wherein the writing unit comprises:
    确定子单元,设置为:确定所述两个存储磁阵中第一存储磁阵写数据成功,第二存储磁阵写数据失败;Determining a subunit, configured to: determine that the first storage magnetic array in the two storage magnetic arrays successfully writes data, and the second storage magnetic array fails to write data;
    记录子单元,设置为:记录用于存储所述第一预定数据的所述两个存储磁阵中相同的存储块;a recording subunit, configured to: record the same one of the two storage magnetic arrays for storing the first predetermined data;
    同步子单元,设置为:在所述第二存储磁阵恢复正常的情况下,依据所述相同的存储块,将所述第一存储磁阵写入成功的数据镜像同步到所述第二存储磁阵中。a synchronization subunit, configured to: synchronize the data mirror successfully written by the first storage magnetic array to the second storage according to the same storage block if the second storage magnetic matrix returns to normal In the magnetic array.
  11. 根据权利要求10所述的装置,其中,所述写入单元还包括:The device of claim 10, wherein the writing unit further comprises:
    停止子单元,设置为:在确定所述两个存储磁阵中第一存储磁阵写数据成功,所述第二存储磁阵写数据失败之后,停止向所述第二存储磁阵写入所述第一预定数据。 Stopping the subunit, configured to: after determining that the first storage magnetic array write data is successful in the two storage magnetic arrays, and after the second storage magnetic array fails to write data, stop writing to the second storage magnetic array The first predetermined data is described.
  12. 根据权利要求10或11所述的装置,其中,所述记录子单元通过如下方式记录用于存储所述第一预定数据的所述两个存储磁阵中相同的存储块:The apparatus according to claim 10 or 11, wherein said recording subunit records the same one of said two storage magnetic arrays for storing said first predetermined data by:
    利用位图bitmap标识的方式记录所述存储块,和/或,通过日志记录的方式记录所述存储块。The memory block is recorded by means of a bitmap bitmap identification, and/or the memory block is recorded by means of log recording.
  13. 根据权利要求8所述的装置,其中,所述读写模块包括:The apparatus of claim 8 wherein said read and write module comprises:
    第二接收单元,设置为:接收用于请求读取第二预定数据的数据读请求;a second receiving unit, configured to: receive a data read request for requesting to read the second predetermined data;
    第三接收单元,设置为:接收所述两个以上的存储磁阵中的第三存储磁阵返回的所述第二预定数据中的部分数据和所述两个以上的存储磁阵中的第四存储磁阵返回的所述第二预定数据中除所述部分数据外的数据。a third receiving unit, configured to: receive partial data in the second predetermined data returned by the third storage magnetic matrix of the two or more storage magnetic arrays, and a number of the two or more storage magnetic arrays And storing, in the second predetermined data returned by the magnetic array, data other than the partial data.
  14. 根据权利要求8所述的装置,其中,所述确定模块包括:The apparatus of claim 8 wherein said determining module comprises:
    创建单元,设置为:创建两个以上新的存储磁阵;或者,Create a unit, set to: create more than two new storage magnetic arrays; or,
    升级单元,设置为:将已存在的预定存储磁阵升级为双活存储磁阵,其中,所述双活存储磁阵包括所述预定存储磁阵和复制所述预定存储磁阵后得到的复制存储磁阵。An upgrade unit configured to: upgrade an existing predetermined storage magnetic array to an active-active storage magnetic array, wherein the dual-active storage magnetic array includes the predetermined storage magnetic array and a copy obtained after copying the predetermined storage magnetic array Store the magnetic array.
  15. 一种双活存储***,包括相互连通且同时提供业务服务的两个数据中心,每一数据中心至少包括一个存储磁阵和一个用户服务器,一个数据中心的用户服务器与所述两个数据中心的存储磁阵连接,所述两个数据中心的存储磁阵之间通过后端网络连接;其中:An active-active storage system includes two data centers that are connected to each other and provide service services at the same time. Each data center includes at least one storage magnetic array and one user server, a data center user server and the two data centers. The storage magnetic array is connected, and the storage magnetic arrays of the two data centers are connected through a back-end network; wherein:
    所述用户服务器,设置为:将用户的写请求同时分发给所述两个数据中心的存储磁阵,在所述两个数据中心的存储磁阵均写成功时,返回给用户写成功;The user server is configured to: simultaneously distribute the write request of the user to the storage magnetic array of the two data centers, and when the storage magnetic arrays of the two data centers are successfully written, return to the user for successful writing;
    所述存储磁阵,设置为:接收到所述写请求后,进行写数据操作。The storage magnetic array is configured to perform a write data operation after receiving the write request.
  16. 根据权利要求15所述的存储***,其中:The storage system of claim 15 wherein:
    所述两个数据中心位于异地,所述两个数据中心通过光纤和光纤交换机相互连通。The two data centers are located off-site, and the two data centers are connected to each other through an optical fiber and a fiber switch.
  17. 根据权利要求15所述的存储***,其中:The storage system of claim 15 wherein:
    所述双活存储***存储两个真实的逻辑单元号LUN,一个是主LUN,对 应一个数据中心的存储磁阵,另一个是副LUN,对应另一数据中心的存储磁阵,对用户呈现所述主LUN。The active-active storage system stores two real logical unit number LUNs, one is a primary LUN, and The storage magnetic array of one data center and the secondary LUN corresponding to the storage magnetic array of another data center present the primary LUN to the user.
  18. 根据权利要求17所述的存储***,其中:The storage system of claim 17 wherein:
    所述存储磁阵包括双控存储控制器和磁盘,所述两个数据中心的存储磁阵中的双控存储控制器组成集群,形成所述双活存储***的控制主体,所述两个数据中心之间通过集群通道进行消息收发。The storage magnetic array includes a dual-control storage controller and a magnetic disk, and the dual-control storage controllers in the storage magnetic array of the two data centers form a cluster to form a control body of the dual-active storage system, and the two data Messages are sent and received between the centers through the trunk channel.
  19. 根据权利要求18所述的存储***,其中:The storage system of claim 18 wherein:
    所述双活存储***还包括以下一种或多种模块:The active-active storage system further includes one or more of the following modules:
    同步镜像模块,设置为:将主LUN的数据在后台按磁盘块进行数据的镜像,直至数据全部镜像到副LUN;The synchronous mirroring module is configured to mirror the data of the primary LUN in the background by the disk block until the data is mirrored to the secondary LUN.
    写入处理模块,设置为:在主LUN和副LUN中一个写成功而另一个写失败时,记录本次写成功的存储块的信息,在写失败的LUN故障恢复后,通过所述同步镜像模块将所述存储块的数据同步到所述写失败的LUN;The write processing module is configured to: when one write succeeds and the other write fails in the primary LUN and the secondary LUN, record the information of the successfully written storage block, and after the failed LUN failure recovery, pass the synchronous mirror The module synchronizes data of the storage block to the LUN that failed to write;
    集群决策模块,设置为:当主LUN和副LUN中有一个写成功而另一个失败时,将写失败的LUN置为不可用,在所述写失败的LUN故障恢复且完成数据同步镜像后再恢复所述写失败的LUN的服务;和/或,当多个集群成员同时发生错误时,通过集群事务的方式来达到集群间的操作更改一致;The cluster decision module is configured to: when one of the primary LUN and the secondary LUN is successfully written and the other fails, the LUN that fails to be written is set to be unavailable, and the LUN fails to be restored after the write fails, and the data synchronization mirror is restored. The service of the failed LUN is written; and/or, when multiple cluster members have errors at the same time, the operation change between the clusters is achieved through the cluster transaction manner;
    仲裁设备,设置为:在集群通信发生异常后,所述仲裁模块作为仲裁服务器,通过投票决定集群中继续提供服务的成员;或者所述仲裁模块作为IP存储局域网络设备,通过资源争抢使占用多数资源者继续提供服务。The arbitration device is configured to: after the cluster communication is abnormal, the arbitration module acts as an arbitration server, and determines, by voting, a member that continues to provide services in the cluster; or the arbitration module acts as an IP storage local area network device, and is occupied by resource contention. Most resource providers continue to provide services.
  20. 一种双活存储***的实现方法,包括:An implementation method of a dual-active storage system includes:
    构建相互连通且同时提供业务服务的两个数据中心,每一数据中心至少包括一个存储磁阵和一个用户服务器,一个数据中心的用户服务器与所述两个数据中心的存储磁阵连接,所述两个数据中心的存储磁阵之间通过后端网络连接;Build two data centers that are connected to each other and provide service services at the same time, each data center includes at least one storage magnetic array and one user server, and a user server of one data center is connected with a storage magnetic array of the two data centers, The storage magnetic arrays of the two data centers are connected by a back-end network;
    所述用户服务器接收到用户的写请求后,将所述写请求同时分发给所述两个数据中心的存储磁阵,在所述两个数据中心的存储磁阵均写成功时,返 回给用户写成功;After receiving the write request of the user, the user server simultaneously distributes the write request to the storage magnetic array of the two data centers, and when the storage magnetic arrays of the two data centers are successfully written, Return to the user to write successfully;
    所述存储磁阵接收到所述写请求后,进行写数据操作。After receiving the write request, the storage magnetic array performs a write data operation.
  21. 根据权利要求20所述的方法,其中:The method of claim 20 wherein:
    所述方法还包括:在一数据中心的存储磁阵上创建主逻辑单元号LUN,在另一个数据中心的存储磁阵上创建副LUN;或者,将一个数据中心已存在的LUN升级为主LUN,在另一数据中心的存储磁阵创建副LUN,将所述主LUN的数据同步镜像到副LUN成功后,副LUN再对外提供服务;The method further includes: creating a primary logical unit number LUN on a storage magnetic array of a data center, and creating a secondary LUN on a storage magnetic array of another data center; or upgrading a existing LUN of the data center to a primary LUN After the secondary LUN is successfully mirrored to the secondary LUN, the secondary LUN provides services to the secondary LUN.
    所述双活存储***对用户呈现所述主LUN。The active-active storage system presents the primary LUN to a user.
  22. 根据权利要求21所述的方法,其中:The method of claim 21 wherein:
    所述方法还包括:所述主LUN和副LUN接收到写请求并进行写数据操作后,如果其中一个写成功而另一个写失败时,记录本次写成功的存储块的信息并将写失败的LUN置为不可用,在写失败的LUN故障恢复后,将所述存储块的数据同步到所述写失败的LUN,完成数据同步镜像后再恢复所述写失败的LUN的服务。The method further includes: after the primary LUN and the secondary LUN receive the write request and perform the data write operation, if one of the writes succeeds and the other write fails, the information of the successfully written storage block is recorded and the write fails. The LUN is set to be unavailable. After the LUN fails to be restored, the data of the storage block is synchronized to the LUN that fails to be written. After the data synchronization mirroring is completed, the service of the LUN that fails to be written is restored.
  23. 根据权利要求20所述的方法,其中:The method of claim 20 wherein:
    所述存储磁阵包括双控存储控制器和磁盘,所述两个数据中心的存储磁阵中的双控存储控制器组成集群,形成所述双活存储***的控制主体,所述两个数据中心之间通过集群通道进行消息收发;The storage magnetic array includes a dual-control storage controller and a magnetic disk, and the dual-control storage controllers in the storage magnetic array of the two data centers form a cluster to form a control body of the dual-active storage system, and the two data Messages are sent and received between the centers through the trunk channel;
    所述方法还包括以下一种或多种处理:The method also includes one or more of the following:
    设置集群决策模块,当多个集群成员同时发生错误时,通过集群事务的方式来达到集群间的操作更改一致;Set the cluster decision module. When multiple cluster members have errors at the same time, the cluster operations are used to achieve the same operation change between clusters.
    设置仲裁模块,在集群通信异常后,通过投票决定集群中继续提供服务的成员;或者通过资源争抢使占用多数资源者继续提供服务。 Set the quorum module to vote for the members of the cluster that continue to provide services after the cluster communication is abnormal; or to allow the majority of the resources to continue to provide services through resource contention.
PCT/CN2016/095865 2015-09-08 2016-08-18 Data reading and writing method and device, double active storage system and realization method thereof WO2017041616A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510567735.7A CN106502822A (en) 2015-09-08 2015-09-08 Data read-write method and device
CN201510567735.7 2015-09-08

Publications (1)

Publication Number Publication Date
WO2017041616A1 true WO2017041616A1 (en) 2017-03-16

Family

ID=58240579

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/095865 WO2017041616A1 (en) 2015-09-08 2016-08-18 Data reading and writing method and device, double active storage system and realization method thereof

Country Status (2)

Country Link
CN (2) CN107153514A (en)
WO (1) WO2017041616A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107797770A (en) * 2017-11-07 2018-03-13 深圳神州数码云科数据技术有限公司 A kind of synchronous method and device of Disk State information
CN109150986A (en) * 2018-07-27 2019-01-04 郑州云海信息技术有限公司 Store access method, device and the storage medium of data under dual-active mode
CN111752758A (en) * 2020-07-01 2020-10-09 浪潮云信息技术股份公司 Bifurcate-architecture InfluxDB high-availability system

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107203339B (en) * 2017-05-10 2020-04-21 杭州宏杉科技股份有限公司 Data storage method and device
CN107329701A (en) * 2017-06-29 2017-11-07 郑州云海信息技术有限公司 Creation method, the apparatus and system of dual-active volume in a kind of storage system
CN108647117A (en) * 2018-04-26 2018-10-12 郑州云海信息技术有限公司 A kind of method of data backup, main system, equipment and computer readable storage medium
CN109246202A (en) * 2018-08-21 2019-01-18 郑州云海信息技术有限公司 A kind of method and system for realizing storage dual-active using optical fiber switch
CN109445992A (en) * 2018-11-01 2019-03-08 郑州云海信息技术有限公司 A kind of dual-active System data management method and relevant apparatus

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7930499B2 (en) * 2007-08-15 2011-04-19 Digi-Data Corporation Method to accelerate block level snapshots in archiving storage systems
CN103049225A (en) * 2013-01-05 2013-04-17 浪潮电子信息产业股份有限公司 Double-controller active-active storage system
CN104331254A (en) * 2014-11-05 2015-02-04 浪潮电子信息产业股份有限公司 Dual-active storage system design method based on dual-active logical volumes
CN104407814A (en) * 2014-11-21 2015-03-11 华为技术有限公司 Method and device for data double writing

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20000002792A (en) * 1998-06-23 2000-01-15 김영환 Data double method of double base station manager
WO2012083693A1 (en) * 2011-07-26 2012-06-28 华为技术有限公司 Voting arbitration method and apparatus for cluster computer system
US9037921B1 (en) * 2012-03-29 2015-05-19 Amazon Technologies, Inc. Variable drive health determination and data placement
CN102761615A (en) * 2012-06-29 2012-10-31 浪潮(北京)电子信息产业有限公司 Method and device for realizing data synchronism of long-distance duplication system
CN103827843B (en) * 2013-11-28 2016-03-09 华为技术有限公司 A kind of data writing method, device and system
CN104486438B (en) * 2014-12-22 2019-02-19 华为技术有限公司 The disaster recovery method and device of distributed memory system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7930499B2 (en) * 2007-08-15 2011-04-19 Digi-Data Corporation Method to accelerate block level snapshots in archiving storage systems
CN103049225A (en) * 2013-01-05 2013-04-17 浪潮电子信息产业股份有限公司 Double-controller active-active storage system
CN104331254A (en) * 2014-11-05 2015-02-04 浪潮电子信息产业股份有限公司 Dual-active storage system design method based on dual-active logical volumes
CN104407814A (en) * 2014-11-21 2015-03-11 华为技术有限公司 Method and device for data double writing

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107797770A (en) * 2017-11-07 2018-03-13 深圳神州数码云科数据技术有限公司 A kind of synchronous method and device of Disk State information
CN107797770B (en) * 2017-11-07 2020-08-21 深圳神州数码云科数据技术有限公司 Method and device for synchronizing disk state information
CN109150986A (en) * 2018-07-27 2019-01-04 郑州云海信息技术有限公司 Store access method, device and the storage medium of data under dual-active mode
CN111752758A (en) * 2020-07-01 2020-10-09 浪潮云信息技术股份公司 Bifurcate-architecture InfluxDB high-availability system
CN111752758B (en) * 2020-07-01 2022-05-31 浪潮云信息技术股份公司 Bifocal-architecture InfluxDB high-availability system

Also Published As

Publication number Publication date
CN106502822A (en) 2017-03-15
CN107153514A (en) 2017-09-12

Similar Documents

Publication Publication Date Title
WO2017041616A1 (en) Data reading and writing method and device, double active storage system and realization method thereof
US11144211B2 (en) Low overhead resynchronization snapshot creation and utilization
CN103226502B (en) A kind of data calamity is for control system and data reconstruction method
US7793060B2 (en) System method and circuit for differential mirroring of data
US20060080574A1 (en) Redundant data storage reconfiguration
US7437598B2 (en) System, method and circuit for mirroring data
US7577867B2 (en) Cross tagging to data for consistent recovery
US7694177B2 (en) Method and system for resynchronizing data between a primary and mirror data storage system
CN100543690C (en) The method and system that is used for managing failures
US20060182050A1 (en) Storage replication system with data tracking
US20070083641A1 (en) Using a standby data storage system to detect the health of a cluster of data storage servers
US11347603B2 (en) Service takeover method, storage device, and service takeover apparatus
MX2007000075A (en) Method of improving replica server performance and a replica server system.
CN105069160A (en) Autonomous controllable database based high-availability method and architecture
US7797571B2 (en) System, method and circuit for mirroring data
US20120084260A1 (en) Log-shipping data replication with early log record fetching
US8924656B1 (en) Storage environment with symmetric frontend and asymmetric backend
CN102710752A (en) Disaster recovery storage system
US20060259723A1 (en) System and method for backing up data
US7376859B2 (en) Method, system, and article of manufacture for data replication
CN106331166A (en) Access method and device of storage resource
CN106325768B (en) A kind of two-shipper storage system and method
JP7192388B2 (en) Parallel processing device, parallel operation execution program and backup method
WO2018076696A1 (en) Data synchronization method and out-of-band management device
US10732860B2 (en) Recordation of an indicator representing a group of acknowledgements of data write requests

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16843554

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16843554

Country of ref document: EP

Kind code of ref document: A1