WO2024087692A1 - 设备管理方法、设备、***和存储介质 - Google Patents

设备管理方法、设备、***和存储介质 Download PDF

Info

Publication number
WO2024087692A1
WO2024087692A1 PCT/CN2023/103133 CN2023103133W WO2024087692A1 WO 2024087692 A1 WO2024087692 A1 WO 2024087692A1 CN 2023103133 W CN2023103133 W CN 2023103133W WO 2024087692 A1 WO2024087692 A1 WO 2024087692A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
message
abnormality
response
edge device
Prior art date
Application number
PCT/CN2023/103133
Other languages
English (en)
French (fr)
Inventor
曾强
李泽宇
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024087692A1 publication Critical patent/WO2024087692A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/24Multipath
    • H04L45/247Multipath using M:N active or standby paths
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/28Routing or path finding of packets in data switching networks using route fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Definitions

  • the present application relates to the field of communication technology, and in particular to a device management method, device, system and storage medium.
  • the current disaster recovery system usually includes two data centers.
  • the business data generated by the main data center of the two data centers will be copied to the other data center, so that the other data center can take over the business after the main data center fails.
  • the two data centers in real space are often tens of kilometers apart, the two data centers are connected by optical cables. Due to external uncontrollable factors such as infrastructure construction, geological changes, and network transformation, the connection between the two data centers is unstable, and network anomalies are prone to affect the communication transmission between the two data centers, and then affect the upper-level business functions. Therefore, it is necessary to promptly discover and isolate abnormal networks to ensure business stability.
  • an embodiment of the present application provides a device management method, in which a first device and a second device transmit business messages through a first network, and the first network includes a network edge device connected to the first device.
  • the method includes: the first device receives a network abnormality message from the network edge device, and the network abnormality message indicates that an abnormality has occurred in the network status of the first network; in response to receiving the network abnormality message, the first device switches to using a second network to transmit a backup message to the second device; wherein the second network is a transmission network with a normal network status between the first device and the second device, and the backup message includes a backup message corresponding to the business message transmitted on the first network when an abnormality occurs in the first network.
  • the network status of the first network can be detected quickly and accurately through the network edge device, and the network abnormality message indicating that the network status is abnormal can be reported to the first device.
  • the first device can respond to the network abnormality message provided by the network edge device and quickly switch to using the second network with normal network status to send a backup message to the second device. It can achieve millisecond-level network switching to transmit messages without waiting for the network response to time out, thereby reducing the duration of the impact of network abnormalities on business functions.
  • the first device and the second device are storage devices, and the first network is an optical transmission network.
  • the method further includes: In response to detecting that a network state of the first network is abnormal, the network edge device sends a network abnormality message to the first device.
  • the network anomaly detection time can be shortened and the abnormal network switching speed can be increased.
  • a network abnormality message in response to detecting that the network status of the first network is abnormal, sending a network abnormality message to the first device, including: the network edge device performs a timed detection of preset indicators on the first network to obtain a detection result corresponding to the preset indicator, and the preset indicator includes at least one of the following: port power, signal strength, bit error rate, latency, and packet loss rate; the network edge device determines that the network status of the first network is abnormal in response to detecting that the detection result corresponding to the preset indicator reaches a preset threshold, and sends a network abnormality message to the first device.
  • the hardware-based network status sensing capability of network edge devices can be utilized to improve the accuracy and speed of network status detection.
  • a unique serial number is added to each message, and the second device is used to verify whether the message is valid based on the serial number and store the valid message, and the response to receiving the network abnormality message, switching to using the second network to transmit the backup message to the second device, includes: the first device suspends the use of the first network for transmission of business messages in response to receiving the network abnormality message, and notifies the second device through the second network that the serial number of the business message transmitted on the first network has expired; the first device adds a valid serial number to the backup message, and uses the second network to send the backup message with the valid serial number added to it to the second device.
  • the isolation and degradation of the first network is achieved, ensuring that the backup messages or business messages sent subsequently no longer use the abnormal first network, and after notifying the second device of the invalid sequence number, it can switch to using the second network to send the backup message, so there is no need to wait for the network to time out before resending the message, and the second device can directly ignore the invalid business message transmitted by the first network based on the notification of the invalid sequence number, avoid repeated storage of the same message, maintain data consistency, and reduce the duration of the impact of network anomalies on business functions.
  • the method before transmitting business messages through the first network, the method also includes: the first device determines network information corresponding to the first network, and the network information includes device identifications of devices included in the first network; wherein the network abnormality message sent by the network edge device to the first device includes the first device identification of the network edge device itself, and in response to receiving the network abnormality message, suspending the use of the first network for transmission of business messages includes: the first device determines the first network including the network edge device based on the first device identification in the network abnormality message and the network information, and deletes the routing information corresponding to the first network passing through the network edge device from the routing table of the first device.
  • the first network including the network edge device can be quickly determined, and by deleting the network information corresponding to the first network from the routing table, rapid network isolation can be achieved, thereby reducing the impact of network abnormalities on the business.
  • the second device is further used to send a detection message to the network edge device through the first network
  • the method further includes: the network edge device fills the first device identifier of the network edge device itself into the detection message in response to receiving the detection message, and sends the detection message filled with the first device identifier to the first device; wherein the determining of the network information corresponding to the first network
  • the information includes: the first device extracts the first device identifier from the detection message filled with the first device identifier in response to receiving the detection message filled with the first device identifier, and records the extracted first device identifier into the network information.
  • the network edge device passed by the currently used first network can be known, so as to quickly determine the network where the abnormality occurs based on the first device identifier and achieve rapid network isolation.
  • determining the network information corresponding to the first network includes: the first device obtains the second device identifier of the network device and records the second device identifier in the network information; wherein the network abnormality message sent by the network edge device to the first device includes the first device identifier of the network edge device itself and the second device identifier of the network device, and in response to receiving the network abnormality message, suspending the use of the first network for transmitting business messages includes: the first device determines the first network passing through the network edge device and the network device based on the first device identifier and the second device identifier in the network abnormality message, and the network information, and deletes the routing information corresponding to the first network passing through the network edge device and the network device from the routing table of the first device.
  • the first network including the network edge device and the network device can be determined more quickly and accurately, and by deleting the network information corresponding to the first network from the routing table, rapid network isolation can be achieved, thereby reducing the impact of network abnormalities on the business.
  • the transmission process of the service message between the first device and the second device includes: the first device obtains the service message to be transmitted, backs up the service message to be transmitted, obtains the backup message and stores it, and adds a serial number to the service message to be transmitted, and uses the first network to send the service message with the added serial number to the second device.
  • the method also includes: the network edge device predicts a time point when a network anomaly will occur in the first network in the future based on historical network anomaly messages of the first network, and the historical network anomaly messages include network anomaly messages counted when network anomalies occur in the first network within a historical time; the network edge device sends a network anomaly message to the first device based on the time point.
  • the detection time of the network status of the first network by the network edge device can be shortened, and the network anomaly message can be sent to the first device in advance, thereby significantly reducing the impact of the network anomaly on the business function.
  • messages are transmitted between the first device and the network edge device through an internal network, a network status detection module is provided in the internal network, and the network status detection module is used to detect the network status of the internal network.
  • the method also includes: the network status detection module sends an internal network abnormality message to the first device in response to detecting that the network status of the internal network is abnormal; the first device switches to using a backup internal network to transmit messages with the network edge device in response to the internal network abnormality message.
  • the backup internal network and the network edge device are switched to transmit messages, thereby improving the disaster recovery capability within the first data center, reducing the waiting time for network timeout, ensuring data consistency, and reducing the impact of internal network abnormalities on the business.
  • an embodiment of the present application provides a device, wherein the device serves as a first device, and the first device and the second device transmit business messages through a first network, and the first network includes a network edge device connected to the first device; the first device includes a receiving unit and a switching unit; wherein the receiving unit is used to receive a network abnormality message from the network edge device, and the network abnormality message indicates that an abnormality has occurred in the network status of the first network; the switching unit is used to switch to using a second network to transmit a backup message to the second device in response to receiving the network abnormality message; wherein the second network is a transmission network with a normal network status between the first device and the second device, and the backup message includes a backup message corresponding to the business message transmitted on the first network when an abnormality occurs in the first network.
  • the first device and the second device are storage devices, and the first network is an optical transmission network.
  • the network edge device is configured to send a network abnormality message to the first device in response to detecting that an abnormality occurs in a network status of the first network.
  • a network abnormality message in response to detecting that the network status of the first network is abnormal, sending a network abnormality message to the first device, including: performing regular detection of preset indicators on the first network to obtain detection results corresponding to the preset indicators, and the preset indicators include at least one of the following: port power, signal strength, bit error rate, latency, and packet loss rate; in response to detecting that the detection result corresponding to the preset indicator reaches a preset threshold, determining that the network status of the first network is abnormal, and sending a network abnormality message to the first device.
  • a unique serial number is added to each message, and the second device is used to verify whether the message is valid based on the serial number and store the valid message.
  • the switching unit includes: an abnormal response module, which is used to suspend the use of the first network for transmitting business messages in response to receiving the network abnormal message, and notify the second device through the second network that the serial number of the business message transmitted on the first network has expired; a message sending module, which is used to add a valid serial number to the backup message, and use the second network to send the backup message with the valid serial number added to the second device.
  • the first device further includes: an information determination unit, for determining network information corresponding to the first network before transmitting the service message through the first network, the network information including device identifications of devices included in the first network; wherein the network abnormality message sent by the network edge device to the first device includes the first device identification of the network edge device itself, and in response to receiving the network abnormality message, suspending the use of the first network for transmission of service messages includes: determining the first network including the network edge device based on the first device identification in the network abnormality message and the network information, and deleting the routing information corresponding to the first network passing through the network edge device from the routing table of the first device.
  • the second device is also used to send a detection message to the network edge device through the first network
  • the network edge device is also used to fill the first device identifier of the network edge device itself into the detection message in response to receiving the detection message, and send the detection message filled with the first device identifier to the first device
  • the information determination unit includes: a first recording module, which is used to extract the first device identifier from the detection message filled with the first device identifier in response to receiving the detection message filled with the first device identifier, and record the extracted first device identifier into the network information.
  • the information determination unit when the same network device is connected between the first device and the network edge device, includes: a second recording module, used to obtain a second device identifier of the network device and record the second device identifier in the network information; wherein, the network abnormality message sent by the network edge device to the first device includes the first device identifier of the network edge device itself and the second device identifier of the network device, and the response to receiving the network abnormality message, suspending the use of the first network for transmitting business messages includes: determining a first network passing through the network edge device and the network device based on the first device identifier and the second device identifier in the network abnormality message, and the network information, and deleting the routing information corresponding to the first network passing through the network edge device and the network device from the routing table of the first device.
  • the first device also includes: a message transmission unit, which is used to obtain the business message to be transmitted when the network status of the first network is normal, back up the business message to be transmitted, obtain the backup message and store it, and add a serial number to the business message to be transmitted, and use the first network to send the business message with the added serial number to the second device.
  • a message transmission unit which is used to obtain the business message to be transmitted when the network status of the first network is normal, back up the business message to be transmitted, obtain the backup message and store it, and add a serial number to the business message to be transmitted, and use the first network to send the business message with the added serial number to the second device.
  • the network edge device is further used to: predict a time point when a network anomaly will occur in the first network in the future based on historical network anomaly messages of the first network, the historical network anomaly messages including network anomaly messages counted when network anomalies occurred in the first network within a historical time; and send a network anomaly message to the first device based on the time point.
  • messages are transmitted between the first device and the network edge device through an internal network.
  • a network status detection module is provided in the internal network. The network status detection module is used to detect the network status of the internal network, and in response to detecting that the network status of the internal network is abnormal, an internal network abnormal message is sent to the first device; the first device also includes: an internal network switching unit, which is used to switch to use a backup internal network to transmit messages with the network edge device in response to the internal network abnormal message.
  • an embodiment of the present application provides a device, wherein the device serves as a first device, and the first device and the second device transmit business messages through a first network, and the first network includes a network edge device connected to the first device; the first device includes an interface and a processor; the interface communicates with the processor; wherein the processor is used to: receive a network abnormality message from the network edge device, the network abnormality message indicating that an abnormality has occurred in the network status of the first network; in response to receiving the network abnormality message, switch to using a second network to transmit a backup message to the second device; wherein the second network is a transmission network with a normal network status between the first device and the second device, and the backup message includes a backup message corresponding to the business message transmitted on the first network when an abnormality occurs in the first network.
  • the first device and the second device are storage devices, and the first network is an optical transmission network.
  • the network edge device is configured to send a network abnormality message to the first device in response to detecting that an abnormality occurs in a network status of the first network.
  • a network abnormality message in response to detecting that the network status of the first network is abnormal, sending a network abnormality message to the first device, including: performing regular detection of preset indicators on the first network to obtain detection results corresponding to the preset indicators, and the preset indicators include at least one of the following: port power, signal strength, bit error rate, latency, and packet loss rate; in response to detecting that the detection result corresponding to the preset indicator reaches a preset threshold, determining that the network status of the first network is abnormal, and sending a network abnormality message to the first device.
  • a unique serial number is added to each message, and the second device is used to verify whether the message is valid based on the serial number and store the valid message, and the response to receiving the network abnormality message, switching to using the second network to transmit the backup message to the second device, includes: in response to receiving the network abnormality message, suspending the use of the first network for transmission of business messages, and notifying the second device through the second network that the serial number of the business message transmitted on the first network has expired; adding a valid serial number to the backup message, and using the second network to send the backup message to which the valid serial number has been added to the second device.
  • the processor is also used to determine network information corresponding to the first network before transmitting the service message through the first network, and the network information includes the device identification of the device included in the first network; wherein the network abnormality message sent by the network edge device to the first device includes the first device identification of the network edge device itself, and the response to receiving the network abnormality message, suspending the use of the first network for transmission of service messages includes: determining the first network including the network edge device based on the first device identification in the network abnormality message and the network information, and deleting the routing information corresponding to the first network passing through the network edge device from the routing table of the first device.
  • the second device is also used to send a detection message to the network edge device through the first network
  • the network edge device is also used to fill the first device identifier of the network edge device itself into the detection message in response to receiving the detection message, and send the detection message filled with the first device identifier to the first device
  • determining the network information corresponding to the first network includes: in response to receiving the detection message filled with the first device identifier, extracting the first device identifier from the detection message filled with the first device identifier, and recording the extracted first device identifier in the network information.
  • determining the network information corresponding to the first network includes: obtaining a second device identifier of the network device and recording the second device identifier in the network information; wherein the network abnormality message sent by the network edge device to the first device includes the first device identifier of the network edge device itself and the second device identifier of the network device, and in response to receiving the network abnormality message, suspending the use of the first network for transmitting business messages includes: determining a first network passing through the network edge device and the network device based on the first device identifier and the second device identifier in the network abnormality message, and the network information, and deleting the routing information corresponding to the first network passing through the network edge device and the network device from the routing table of the first device.
  • the processor is also used to: when the network status of the first network is normal, obtain the business message to be transmitted, back up the business message to be transmitted, obtain the backup message and store it, and add a serial number to the business message to be transmitted, and use the first network to send the business message with the added serial number to the second device.
  • the network edge device is further used to: predict a time point when a network anomaly will occur in the first network in the future based on historical network anomaly messages of the first network, the historical network anomaly messages including network anomaly messages counted when a network anomaly occurred in the first network within a historical time; and send a network anomaly message to the first device based on the time point.
  • the first device and the network edge device transmit messages through an internal network
  • a network status detection module is provided in the internal network
  • the network status detection module is used to detect the network status of the internal network, and in response to detecting that the network status of the internal network has changed
  • the processor is also used for: in response to the internal network abnormality message, switching to use the backup internal network to transmit messages with the network edge device.
  • an embodiment of the present application provides a device management system, comprising a first device and a network edge device, wherein the first device is used to execute the operations performed by the first device in the above-mentioned device management method, and the network edge device is used to execute the operations performed by the network edge device in the above-mentioned device management method.
  • an embodiment of the present application provides a non-volatile computer-readable storage medium having computer program instructions stored thereon.
  • the device management method implements the above-mentioned first aspect or one or several of the multiple possible implementation methods of the first aspect.
  • an embodiment of the present application provides a computer program product, including a computer-readable code, or a non-volatile computer-readable storage medium carrying a computer-readable code.
  • the processor in the electronic device executes the device management method of the above-mentioned first aspect or one or several of the multiple possible implementation methods of the first aspect.
  • FIG. 1 shows a schematic diagram of a disaster recovery system provided according to the related art.
  • FIG2 shows a schematic diagram of a disaster recovery system provided according to an embodiment of the present application.
  • FIG3 is a schematic diagram showing a message transmission process according to an embodiment of the present application.
  • FIG. 4 shows a flow chart of a device management method according to an embodiment of the present application.
  • FIG5 is a schematic diagram showing a process of device registration and network anomaly reporting according to an embodiment of the present application.
  • FIG6 shows a schematic diagram of a disaster recovery system according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram showing a network switching process according to an embodiment of the present application.
  • FIG8 is a schematic diagram showing a process of determining network information according to an embodiment of the present application.
  • FIG9 shows a schematic structural diagram of a first device according to an embodiment of the present application.
  • FIG. 10 shows a structural diagram of an electronic device 1300 according to an embodiment of the present application.
  • Network refers to a communication network built through Fibre Channel (FC), Internet Protocol (IP) and Remote Direct Memory Access (RDMA) protocols; data can be replicated between two storage devices in two data centers through the network so that the two storage devices have the same data copy.
  • FC Fibre Channel
  • IP Internet Protocol
  • RDMA Remote Direct Memory Access
  • Link refers to a logical link (or logical channel, etc.) established over the network from a network card port of a storage device in a data center (DC) to a network card port of a storage device in another data center.
  • This link can specifically implement message transmission when data is copied between two storage devices.
  • Active-active means that the business data generated by the host is written to the storage devices of two data centers at the same time. If the storage device of any data center fails, the storage device of the other data center will immediately and automatically take over the business. This ensures that there is no data loss and no business interruption when a device fails.
  • Synchronous replication means that the business data generated by the host is written to the storage devices of the two data centers at the same time.
  • the primary storage device of the primary data center fails, the user can manually configure the disaster recovery storage device of the disaster recovery data center to take over the business. This ensures that no data is lost when the device fails and the business can be restored in minutes.
  • Asynchronous replication means that the business data generated by the host is only written to the primary storage device of the primary data center.
  • the business data in the primary storage device is synchronized to the disaster recovery storage device in the disaster recovery data center periodically or manually triggered by the background. In this way, when the primary storage device fails, the disaster recovery storage device has the most recent complete data.
  • Network anomaly detection refers to the storage device periodically testing network status indicators such as bit error rate, latency, and packet loss rate. When the indicator exceeds the corresponding threshold, the network is judged to be in an abnormal state (i.e., sub-healthy state).
  • Network isolation means that there are usually multiple networks between two storage devices that can be used for message transmission. When the network status of one of the networks becomes abnormal, the storage device no longer uses the abnormal network for message transmission and switches to use the network with normal status for message transmission.
  • Network equipment refers to the hardware equipment used to connect various servers, computers, application terminals and other nodes to form an information communication network.
  • Common network equipment includes switches, routers, firewalls, bridges, hubs, gateways, network interface controllers (NIC, also known as network interface cards), fiber optic transceivers, etc.
  • NIC network interface controllers
  • Network edge devices refer to network devices located between the intranet (also known as the internal network) and the external network (also known as the public network), or network devices that connect the intranet and the external network, usually including switches, wavelength division multiplexers, network interface cards, etc.
  • the current disaster recovery system usually includes two data centers, and the business data generated by the main data center in the two data centers will be copied to the other data center, so that after the main data center fails, the other data center can take over the business.
  • the two data centers in the real space are often tens of kilometers apart, they are usually connected to the network through the optical cable of the leased operator. Due to external uncontrollable factors such as infrastructure construction, geological changes, and network transformation, the connection between the two data centers is unstable, and network anomalies are prone to affect the communication transmission between the two data centers, thereby affecting the upper-level business functions. Therefore, it is necessary to promptly discover and isolate abnormal networks to ensure the stability of the business.
  • FIG1 shows a schematic diagram of a disaster recovery system provided according to the relevant technology.
  • the main data center and the same-city data center each include a web server cluster, an application server cluster, a database server cluster, and a storage device, wherein the web server cluster, the application server cluster, and the database server cluster can be understood as a host cluster for producing business data.
  • These host clusters will send the produced business data to the storage device for storage, and the storage devices of the two data centers can replicate data through a replication network.
  • the storage devices of the two data centers regularly send detection messages to the other end, and count the delay and packet loss rate of the detection messages within a certain period.
  • the first is that the storage devices in the two data centers each count indicators such as the latency and packet loss rate of business messages on the network. When these indicators reach a certain threshold, the network is reported as abnormal and network isolation is required.
  • the second is that the storage devices in the two data centers each count indicators such as the latency and packet loss rate of business messages on the network. When these indicators reach a certain threshold, the network is reported as abnormal and network isolation is required.
  • the network anomaly detection method in the above-mentioned related technologies relies on the statistics of the indicators of the detection message or the abnormal business message by two storage devices.
  • the message When the message is transmitted in the replication network, it usually passes through the network devices such as the interface card, the switch, and the wavelength division multiplexing.
  • the network devices such as the interface card, the switch, and the wavelength division multiplexing.
  • it is usually necessary to count the delay and packet loss rate within a certain period, and only when the statistical results reach the threshold corresponding to each indicator can it be determined that the network is abnormal and needs to be isolated; and the current statistical period is generally at the minute level, and the threshold of the indicator is the empirical analysis value of the historical data, which makes the network anomaly detection require a minute-level time to detect the abnormal network.
  • the network anomaly detection method in the above-mentioned related technologies has a long detection time and low accuracy.
  • the network isolation method in the related technology is mainly to trigger network isolation after a storage device detects a network abnormality.
  • the network with normal network status is given priority.
  • the network routing table can be updated to give priority to the network with normal status; for business messages that have entered the replication network for transmission, wait for the network to time out and then switch the network to try again.
  • the storage device since the storage device does not know the specific sending status of the business message, it can only rely on the timeout mechanism of the network protocol, wait for the network timeout report, and then resend the business message.
  • the time unit for calculating the timeout in the timeout mechanism is often 10 seconds, which makes the time required for switching networks to resend business messages in network isolation processing longer; especially in the database scenario, if the business message transmitted is database log data, since the database needs to respond to the return log of successful message transmission first, and then process the data consistency task, if a business message does not return a successful transmission log for a long time, other subsequent business messages will not be processed in time.
  • the current network anomaly detection and isolation methods require minutes to eliminate the impact of abnormal networks on user services after a network anomaly occurs.
  • the impact of network anomalies on services is often required to be less than a few seconds. Therefore, it is very important to study how to quickly detect and isolate abnormal networks.
  • an embodiment of the present application provides a device management method.
  • the network status is sensed by a network edge device connected to the storage device.
  • the network edge device can perform chip-level monitoring of port power, signal strength, bit error rate, packet loss rate, etc.
  • a network abnormality message is immediately reported to the storage device of the data center.
  • the device that reports the network failure can be a network edge device such as a switch and a wavelength division multiplexer; after receiving the network abnormality message reported by the network edge device, the storage device can immediately isolate (that is, downgrade) the abnormal network so that the subsequently sent business messages no longer use the abnormal network, and use the backup network with normal network status to notify the storage device of another data center of the serial number of the failed business message.
  • a network edge device such as a switch and a wavelength division multiplexer
  • the failed business message includes the message that is being transmitted on the abnormal network when the network abnormality occurs, and the backup message corresponding to the failed business message is immediately used to switch the network and retry, that is, switch to use the backup network to send the backup message to the storage device of another data center, which reduces the waiting time for network timeout, ensures data consistency, and reduces the impact of network abnormalities on business.
  • the device management method of the embodiment of the present application can detect network anomalies from the hardware level by the network edge device.
  • the network edge device bypasses the intermediate network device and directly sends the network anomaly message to the storage device.
  • the storage device can quickly isolate the abnormal network. And immediately use backup messages to switch networks and retry the business messages transmitted in the abnormal network.
  • the device management method of the present application reduces the waiting time of network timeout and reduces the impact of network anomalies on the service.
  • the device management method of the embodiment of the present application can be applied to various replication services between two data centers under various disaster recovery systems, including but not limited to dual-active, synchronous replication, asynchronous replication and other replication services; specifically, it can be applied to disaster recovery scenarios where the transmission network between data centers has occasional network anomalies and the service has relatively high requirements for the stability of network performance. It can quickly detect network anomalies and achieve millisecond-level network switching without waiting for network response timeout, thereby reducing the impact of network anomalies on business functions.
  • the party that initiates cross-device data transmission and sends data may be referred to as the source end, and the party that receives data may be referred to as the sink end.
  • a device that is a source end in one pair of relationships may also be a sink end in another pair of relationships, that is, for a device, it may be both a source end and a sink end of another device.
  • FIG2 shows a schematic diagram of a disaster recovery system provided according to an embodiment of the present application.
  • the following introduces the device management method of the embodiment of the present application through the disaster recovery system shown in FIG2.
  • the disaster recovery system includes a first data center and a second data center.
  • the first data center includes a host cluster 201a, a network edge device 202a, a network device (such as a switch) 203a and a first device 204a;
  • the second data center includes a host cluster 201b, a network edge device 202b, a network device 203b and a second device 204b; wherein there is a dual-active data replication service between the first device 204a and the second device 204b;
  • the first device 204a and the second device 204b can be storage devices, and of course can also be any electronic device such as a computing device (such as a computer cluster), which is not limited to the embodiment of the present application.
  • the host clusters 201a and 201b are respectively used to produce business data, and each transmits the business data to the first device 204a and the second device 204b through the network devices 203a and 203b to store the business data;
  • the device as the source end (the first device 204a or the second device 204b) can transmit the business data in the form of business messages through the first network to the receiving end device, that is, the first device 204a of the first data center and the second device 204b of the second data center transmit business messages through the first network to realize the above-mentioned replication business;
  • the first network can be an optical transmission network, and of course it can also be an electrical transmission network, which is not limited to this embodiment of the present application.
  • the first network includes a network edge device 202a connected to the first device 204a and a network edge device 202b connected to the second device 204b.
  • the first device 204a and the network edge device 202a may have the following two communication modes: the first device is connected to the network edge device through an internal network via a network device (such as a switch) to perform two-way transmission of service messages; the first device and the network edge device are directly connected via a network cable to directly receive network abnormality messages reported by the network edge device.
  • a network device such as a switch
  • the transmission process of the service message between the first device 204a and the second device 204b includes: the first device 204a obtains the service message to be transmitted, backs up the service message to be transmitted, obtains the backup message and stores it, and adds a sequence number to the service message to be transmitted, and uses the first network to send the service message with the added sequence number to the second device 204b.
  • the service message can be sent to the second device 204b using the available transmission link in the first network.
  • FIG. 3 is a schematic diagram showing a message transmission process according to an embodiment of the present application.
  • the host cluster 201a passes A business message is sent to the first device 204a through the network device 203a.
  • the business message with the added serial number is transmitted to the second device 204b after passing through the network edge device 202a and the network edge device 202b of the second data center.
  • the second device 204b verifies the serial number of the business message and locally caches the business message after the verification is successful.
  • the cached business message can then be parsed to obtain the parsed business data and store it.
  • the host cluster 201a can convert the service data into a service message and send it to the first device 204a. In response to receiving the service message sent by the host cluster 201a, the first device 204a backs up the service message to obtain a backup message. It should be understood that the embodiment of the present application does not limit the conversion method of the service data and the backup method of the service message.
  • the first device 204a after receiving the business message sent by the host cluster 201a, the first device 204a can also add a unique serial number to the business message, and send the business message with the added serial number to the second device 204b. In this way, after the second device 204b receives the business message with the added serial number, it can verify whether the business message is valid according to the serial number and store the valid business message to ensure the consistency of data between the two storage devices.
  • the device management method includes:
  • step S401 the first device 204a receives a network abnormality message from the network edge device 202a, where the network abnormality message indicates that a network state of the first network is abnormal.
  • the network edge device 202a itself can detect preset indicators such as port power, signal strength, bit error rate, packet loss rate (also known as frame loss rate) corresponding to the first network (or the transmission link under the first network), that is, the network edge device itself has the hardware perception capability of the network status, and can more accurately perceive the network status of the first network passing through the network edge device.
  • the network abnormality message can be directly reported to the first device 204a, thereby improving the detection accuracy and speed of the network status and reducing the impact of network abnormalities on business functions.
  • the first device 204a can decode the network anomaly message to obtain readable information contained in the network anomaly message, such as the first device identifier of the network edge device, the network anomaly type of the first network, etc.
  • the first device identifier of the network edge device 202a may be, for example, a globally unique identifier (World Wide Number, WWN) of the network edge device 202a; wherein, the first device 204a may perform security verification and authority verification on the network edge device that reports the network abnormality message based on the first device identifier in the network abnormality message to ensure that the source of the network abnormality message is credible. It should be understood that the embodiment of the present application does not limit the specific verification method for the above-mentioned security verification and authority verification.
  • the security verification may be implemented by verifying whether the certificate of the network edge device 204a corresponding to the first device identifier is within the validity period through the certificate authority (Cercificate Authority, CA); and, it may also be implemented by verifying whether the first device identifier is in the local authorization whitelist of the first device 204a to implement authority verification, etc.
  • CA Certificate Authority
  • the network edge device 202a may first register the device to be allowed to report the network abnormality message to the first device 204a.
  • FIG5 shows a schematic diagram of the device registration and network abnormality reporting process according to an embodiment of the present application. As shown in FIG5, the network edge device 202a may register the device to the first device 204a. Send registration device information, which carries a certificate and a first device identifier of the network edge device 202a.
  • the fault perception management module in the first device 204a can send the certificate in the device information to the CA center to verify the validity of the certificate; the fault perception management module can simultaneously send the first device identifier in the device information to the network management module of the first device 204a, and the network management module can perform authority verification on the network edge device 202a based on the first device identifier; after the certificate verification and the authority verification are passed, it is determined that the network edge device 202a is successfully registered.
  • the network edge device 202a when the network edge device 202a is successfully registered and the network edge device 202a detects that the network status of the first network is abnormal, the network edge device 202a can send a network abnormality message to the first device 204a. After receiving the network abnormality message, the fault perception management module in the first device 204a extracts the first device identifier from the network abnormality message, and generates a network abnormality event representing the abnormality of the first network and reports it to the network management module.
  • the network management module After receiving the network abnormality event, the network management module switches the network to send a backup message and recalls the business message transmitted in the first network, that is, executes step S402 to switch to using the second network to transmit the backup message to the second device 204b; wherein, recalling the business message transmitted in the first network can avoid sending the business message that is repeated with the backup message to the second device 204b, which is conducive to maintaining data consistency.
  • step S402 in response to receiving a network abnormality message, the first device 204a switches to using the second network to transmit a backup message to the second device 204b; wherein the second network is a transmission network with a normal network status between the first device 204a and the second device 204b, and the backup message includes a backup message corresponding to the business message transmitted on the first network when an abnormality occurs in the first network.
  • the first device 204a can switch to use other transmission networks with normal network status, that is, switch to use the second network to transmit backup messages to the second device 204b.
  • the second network can also be called a backup network, and the transmission link under the second network can be called a backup link.
  • the first device 204a when the network status of the first network is normal, during the transmission of business messages between the first device 204a and the second device 204b, the first device 204a will back up the business messages to be transmitted, obtain the backup messages and store them; after successfully receiving the business messages, the second device 204b will usually feedback a message of successful reception to the first device 204a, so that the first device 204a can know which business messages have been successfully sent to the second device 204b, or which business messages under the serial numbers have been successfully copied to the second device 204b.
  • the first device 204a can directly switch to using the second network to transmit backup messages to the second device 204b, that is, it can switch to using a transmission link under the second network to transmit backup messages to the second device 204b.
  • the backup message may include a backup message corresponding to the business message transmitted on the first network when the abnormality occurs in the first network.
  • FIG6 shows a schematic diagram of a disaster recovery system according to an embodiment of the present application.
  • 1 and 2 on the host, 0 to 11 on the switch, and P0 to P3 on the storage devices “0A, 0B, 0C, 0D” all represent ports, and “a, b, c, d” represent network edge devices; wherein, the lines between the ports can represent specific transmission links under the network; in FIG6 , there are two networks between the first data center and the second data center, including: a network passing through network edge devices a and b and a network passing through network edge devices c and d; if the network edge device c detects that a network anomaly occurs in the network passing through the network edge devices c and d, then the network passing through the network edge devices c and d can be the first network, and then the network passing through the network edge devices a and b can be the second network, that is, it can be switched to use the network passing through the network edge devices a and b to
  • the network status of the first network can be detected quickly and with high precision through the network edge device, and a network abnormality message indicating that the network status is abnormal can be reported to the first device.
  • the first device can respond to the network abnormality message provided by the network edge device and quickly switch to using the second network with a normal network status to send a backup message to the second device. This can achieve millisecond-level network switching to transmit messages without waiting for the network response to time out, thereby reducing the duration of the impact of network abnormalities on business functions.
  • the network edge device 202a can detect the network state of the first network, and when detecting that the network state of the first network is abnormal, send a network abnormality message to the first device 204a.
  • the method before step S401, the method further includes:
  • the network edge device 202a In response to detecting that the network state of the first network is abnormal, the network edge device 202a sends a network abnormality message to the first device 204a. In this way, the network edge device directly sends the network abnormality message to the first device, which can shorten the network abnormality detection time and improve the abnormal network switching speed.
  • a network abnormality message is sent to the first device, including: the network edge device 202a performs a timed detection of preset indicators on the first network to obtain a detection result corresponding to the preset indicator, and the preset indicator includes at least one of the following: port power, signal strength, bit error rate, delay, and packet loss rate; the network edge device 202a determines that the network status of the first network is abnormal in response to detecting that the detection result corresponding to the preset indicator reaches a preset threshold, and sends a network abnormality message to the first device 204a.
  • the timed detection of the preset indicator on the first network can be understood as detecting the preset indicator on the first network according to a certain timed period, for example, detecting the port power according to a timed period every 2 minutes. It should be understood that the network edge device 202a itself has the detection capability of the above preset indicator, and the embodiment of the present application does not limit the detection method of the network edge device.
  • the preset threshold value can be an experience value customized according to historical experience, and each preset indicator can correspond to its own preset threshold value.
  • the network edge device 202a detects only one preset indicator, it can determine that the network state of the first network is abnormal when it detects that the detection result of the one preset indicator reaches the preset threshold; if the network edge device 202a detects multiple preset indicators, it can determine that the network state of the first network is abnormal when it detects that the detection results of the multiple preset indicators all reach the preset threshold, or it can also determine that the network state of the first network is abnormal when it detects that the detection result of any preset indicator among the multiple preset indicators reaches the preset threshold, which is not limited to this embodiment of the present application.
  • the network edge device 202a Before sending the network abnormality message to the first device 204a, the network edge device 202a may write its own first device identification into the network abnormality message, so that the first device 204a knows the source of the network abnormality message.
  • the hardware-based network status sensing capability of network edge devices can be utilized to improve the accuracy and speed of network status detection.
  • some historical network anomaly messages can be statistically analyzed, that is, the law of anomalies in the first network can be analyzed, so that the time point when the network anomaly may occur in the future can be predicted based on the analysis results, and the network anomaly message can be sent to the first device 204a in advance based on the predicted time point.
  • the method also includes:
  • the network edge device 202a predicts the time point at which a network anomaly will occur in the first network in the future based on the historical network anomaly messages of the first network.
  • the historical network anomaly messages include the network anomalies counted when the network anomaly occurs in the first network within the historical time. Message;
  • the network edge device 202a sends a network anomaly message to the first device 204a based on the time point.
  • a deep learning model or a machine learning model can be used to predict the time point when a network anomaly will occur in the first network in the future based on the historical network anomaly messages of the first network; of course, the historical network anomaly messages of the first network can also be modeled based on a statistical model, and the time point when a network anomaly will occur in the first network in the future can be predicted based on the constructed model, and the embodiments of the present application are not limited to this.
  • the network edge device 202a sends a network abnormality message to the first device 204a based on the time point, which may include: sending a network abnormality message to the first device 204a when the current time reaches a certain time point before the time point, for example, if the predicted time point is 7 o'clock on September 30, the network abnormality message can be sent to the first device 204a when the current time is 6:59 on September 30; of course, it is also possible to send a network abnormality message to the first device 204a when the current time reaches the time point, for example, if the predicted time point is 7 o'clock on September 30, the network abnormality message can be sent to the first device 204a when the current time is 7 o'clock on September 30, etc.
  • the detection time of the network status of the first network by the network edge device can be shortened, and the network anomaly message can be sent to the first device in advance, thereby significantly reducing the impact of the network anomaly on the business function.
  • step S402 in response to receiving the network abnormality message, switching to use the second network under the second network to transmit the backup message to the second device includes:
  • Step S4021 in response to receiving the network abnormality message, the first device 204a suspends the use of the first network for transmission of service messages, and notifies the second device 204b through the second network that the sequence number of the service message transmitted on the first network has expired;
  • Step S4022 The first device 204a adds a valid sequence number to the backup message, and uses the second network to send the backup message to which the valid sequence number has been added to the second device.
  • step S4021 the use of the first network for transmitting business messages is suspended, for example, which may include deleting the routing information corresponding to the first network from the routing table of the first device.
  • the routing information may include, for example, the IP address, subnet mask, gateway and other information of the network device on the first network.
  • the first device 204a can no longer use the first network for message transmission, thereby achieving network isolation (or degradation) of the first network.
  • the second device 204b is notified through the second network that the sequence number of the service message transmitted on the first network has expired.
  • the sequence number of the service message transmitted on the first network can be sent to the second device 204b to notify the second device 204b that the service messages with these sequence numbers are invalid messages.
  • a sequence number threshold greater than the sequence number of the service message transmitted on the first network can also be sent to the second device 204b to notify the second device 204b that the sequence number below the sequence number threshold is an invalid sequence number, so that the second device 204b can perform a validity check based on the sequence number threshold.
  • the sequence number added by the first device 204a to the backup message should be different from the sequence number of the service message transmitted on the first network, for example, it can be a sequence number greater than the above sequence number threshold, so that the second device 204b can store the backup message based on the valid sequence number and ignore the service message with an invalid sequence number.
  • the second device 204b has stored the backup message, and the service message transmitted on the first network is transmitted to the second device 204b after the network returns to normal, it can avoid the second device 204b from repeatedly storing the service message with the same content as the backup message, which is conducive to ensuring data consistency between the two storage devices.
  • Figure 7 shows a schematic diagram of a network switching process according to an embodiment of the present application.
  • the network edge device 202a in response to detecting that the network status of the first network is abnormal, the network edge device 202a sends a network abnormality message to the first device 204a; the first device 204a deletes the routing information corresponding to the first network passing through the network edge device 202a from the routing table according to the first device identifier and network information in the network abnormality message; the first device 204a notifies the second device 204b through the second network that the serial number of the service message transmitted on the first network has expired; the first device 204a uses the second network to send a backup message to the second device with a valid serial number added; if the second device 204b receives a service message transmitted on the first network with an abnormality, it verifies the serial number of the service message, and if the serial number verification fails, returns a verification failure message to the first device 204a to ensure data consistency.
  • the isolation and degradation of the first network is achieved, ensuring that the backup messages or business messages sent subsequently no longer use the abnormal first network, and after notifying the second device of the invalid sequence number, it can switch to using the second network to send the backup message, so there is no need to wait for the network to time out before resending the message, and the second device can directly ignore the invalid business message transmitted by the first network based on the notification of the invalid sequence number, avoid repeated storage of the same message, maintain data consistency, and reduce the duration of the impact of network anomalies on business functions.
  • the method before transmitting the service message through the first network, the method further includes:
  • the first device 204a sends a link establishment request to the second device 204b, and the link establishment request is used to establish a transmission link between the first device 204a and the second device 204b based on the first network, and the transmission link is used to transmit messages between the first device 204a and the second device 204b; the first device 204a determines the network information corresponding to the first network, and the network information includes the device identification of the device included in the first network.
  • the first device 204a can send a link establishment request to the second device 204b through the first network.
  • each network device will automatically allocate an input-output (IO) port number based on the link establishment request, thereby establishing a transmission link between the first device 204a and the second device 204b to specifically perform message transmission.
  • IO input-output
  • determining the network information corresponding to the first network may, for example, include: during the above-mentioned transmission link establishment process, recording the first device identification of the network edge device 202a connected to the first device 204a on the first network, and/or recording the second device identification of the network device 203a between the first device 204a on the first network and the network edge device 202a.
  • the network information corresponding to the first network by recording the network information corresponding to the first network, it is possible to use the network information to quickly determine the network where the network anomaly occurs and isolate and downgrade the abnormal network.
  • a network device 203a is connected between the first device 204a and the network edge device 204a.
  • determining the network information corresponding to the first network includes:
  • the first device 204a obtains the second device identification of the network device and records the second device identification in the network information. In this way, the second device identification of the same network device passed between the first device and the network edge device can be effectively recorded, so as to determine the abnormal network based on the second device identification.
  • the first device 204a establishes a transmission link by sending a link establishment request to the first network.
  • the link establishment request passes through a network device (such as a switch) between the first device 204a and the network edge device 204a
  • the first device 204a can send a query request for a device identifier to the network device to obtain a second device identifier of the network device, and then record the second device identifier obtained based on the query request into the above network information; or the user can manually query
  • the second device identification of the network device is queried and imported into the first device 204a, so as to record the second device identification into the above network information, which is not limited in this embodiment of the present application.
  • the second device 204b when multiple network devices are passed between the first device 204a and the network edge device 204a, the second device 204b is further used to send a detection message to the network edge device through the first network.
  • the second device 204b may send the detection message to the network edge device through the first network.
  • the method further includes:
  • the network edge device 202a fills the first device identifier of the network edge device 202a itself into the detection message, and sends the detection message filled with the first device identifier to the first device 204a;
  • Determining the network information corresponding to the first network includes: the first device 204a extracts the first device identifier from the detection message filled with the first device identifier in response to receiving the detection message filled with the first device identifier, and records the extracted first device identifier in the network information.
  • the process of querying and recording the second device identification of the network device may be cumbersome, so only the first device identification of the network edge device connected to the first device may be recorded; the first device identification of the network edge device may be obtained by, after the above-mentioned transmission link is successfully established, the two storage devices send a detection message to the other end, and after the network edge device at the other end receives the detection message, it can fill in its own device identification into the detection message, and send the detection message filled with its own device identification to the connected storage device, so that the storage device can extract the device identification of the network edge device and store it in the network information.
  • FIG8 is a schematic diagram of a network information determination process according to an embodiment of the present application.
  • a first device 204a in a first data center may send a link establishment request to a second device 204b in a second data center based on a first network to establish a transmission link between the first device 204a and the second device 204b, wherein the transmission link passes through a network edge device 202a connected to the first device 204a and a network edge device 202b connected to the second device 204b; after the transmission link is established, the first device 204a may send a detection message to the second device 204b, wherein the detection network is used to detect the device identification of the opposite network edge device 202b, and the network edge device 202b may send a detection message to the detection message after receiving the detection message.
  • the second device 204b can extract the WWN of the network edge device 202b from the detection message, and record the WWN in the local network information; correspondingly, the second device 204b can send a detection message to the first device 204a, the detection network is used to detect the device identification of the opposite network edge device 202a, after receiving the detection message, the network edge device 202a can fill in its own WWN in the detection message, and send the detection message filled with WWN to the first device 204a, the first device 204a can extract the WWN of the network edge device 202a from the detection message, and record the WWN in the local network information.
  • the network edge device passed by the currently used first network can be known, so as to quickly determine the network where the abnormality occurs based on the first device identifier and achieve rapid network isolation.
  • the network abnormality message sent by the network edge device to the first device may include the first device identifier of the network edge device itself.
  • suspending the use of the first network for transmission of service messages includes:
  • the first device 204a determines the first network including the network edge device 202a according to the first device identifier and network information in the network abnormality message, and deletes the routing information corresponding to the first network passing through the network edge device 202a from the routing table of the first device.
  • the network information includes the device identification of the device included in the first network, that is, the device identification of the device included in the first network. Therefore, according to the first device identifier in the network abnormality message, the network information can be traversed to determine the first network including the network edge device 202a.
  • the first device 204a after deleting the routing information corresponding to the first network from the routing table of the first device, since the routing information may include, for example, the IP address, subnet mask, gateway and other information of the network device on the first network, the first device 204a will not be able to use the first network for message transmission, thereby achieving network isolation (or degradation) of the first network.
  • the first network including the network edge device can be quickly determined, and by deleting the routing information corresponding to the first network from the routing table, rapid network isolation can be achieved, thereby reducing the impact of network abnormalities on the business.
  • the network information may also record a second device identifier of the same network device connected between the first device 204a and the network edge device 202a.
  • the network abnormality message sent by the network edge device 202a to the first device 204a includes the first device identifier of the network edge device 202a itself and the second device identifier of the network device.
  • suspending the use of the first network for transmission of service messages includes:
  • the first device 204a determines the first network passing through the network edge device 202a and the network device based on the first device identifier and the second device identifier in the network abnormality message and the network information, and deletes the routing information corresponding to the first network passing through the network edge device 202a and the network device from the routing table of the first device 204a.
  • the network information includes the device identification of the device included in the first network, that is, it may include the first device identification of the network edge device 202a passing through the first network and the second device identification of the network device. Therefore, the network information can be traversed according to the first device identification and the second device identification in the network abnormality message to determine the first network including the network edge device 202a and the network device.
  • the first network including the network edge device and the network device can be determined more quickly and accurately, and by deleting the network information corresponding to the first network from the routing table, rapid network isolation can be achieved, thereby reducing the impact of network abnormalities on the business.
  • the first device 204a can be connected to the network edge device 202a through the internal network via a network device (such as a switch), that is, the first device 204a and the network edge device 202a can transmit messages through the internal network of the first data center.
  • a network status detection module can be provided in the internal network, and the network status detection module can be used to detect the network status of the internal network. The method also includes:
  • the network status detection module sends an internal network abnormality message to the first device 204a in response to detecting that the network status of the internal network is abnormal;
  • the first device 204a switches to use the backup internal network and the network edge device to transmit the message.
  • the network status detection module can realize the status detection of the internal network by performing periodic detection on at least one preset indicator of the internal network's port power, signal strength, bit error rate, delay, and packet loss rate.
  • at least one preset indicator reaches a preset threshold, it is determined that the network status of the internal network is abnormal, and an internal network abnormality message is sent to the first device 204a.
  • the first device 204a responds to the abnormal message of the internal network and switches to use the backup internal network and the network edge device for message transmission, for example, it may include: the first device 204a suspends the use of the internal network for the transmission of business messages, and sends a message to the network edge device 202a through the backup internal network that the serial number of the business message transmitted on the internal network has expired After receiving the notification, the network edge device 202a will forward the notification to the second device 204b; and the first device 204a will send a backup message with a valid sequence number added to the network edge device 202a through the backup internal network.
  • the backup message may be a backup message corresponding to the service message transmitted on the internal network where the network abnormality occurs.
  • the network edge device 202a may forward the backup message to the second device 204b.
  • the second device 204b can verify whether the backup message is valid based on the sequence number of the backup message, and effectively store the valid backup message, ignoring the invalid service message.
  • the backup internal network and the network edge device are switched to transmit messages, thereby improving the disaster recovery capability within the first data center, reducing the waiting time for network timeouts, ensuring data consistency, and reducing the impact of internal network abnormalities on the business.
  • the above-mentioned embodiments of the present application are mainly described with the first data center as the main body.
  • the device management method of the embodiments of the present application is also applicable to the second data center, that is, the second data center can refer to the specific implementation method of the device management method of the above-mentioned embodiments of the present application to reduce the impact of network anomalies on the business of the second data center and improve the disaster recovery capability of the second data center.
  • the disaster recovery system shown in Figures 2 and 6 above are some possible implementation methods provided by the embodiments of the present application, and do not represent all the implementation methods of the embodiments of the application. It should be understood that the disaster recovery system at least includes storage devices and network edge devices in a single-side data center.
  • An embodiment of the present application also provides a device, which can be used as a first device, and the first device and the second device transmit business messages through a first network, and the first network includes a network edge device connected to the first device; as shown in Figure 9, the first device includes a receiving unit 901 and a switching unit 902; wherein the receiving unit 901 is used to receive a network abnormality message from the network edge device, and the network abnormality message indicates that the network status of the first network is abnormal; the switching unit 902 is used to switch to using a second network to transmit a backup message to the second device in response to receiving the network abnormality message; wherein the second network is a transmission network with a normal network status between the first device and the second device, and the backup message includes a backup message corresponding to the business message transmitted on the first network when an abnormality occurs in the first network.
  • the first device and the second device are storage devices, and the first network is an optical transmission network.
  • the network edge device is configured to send a network abnormality message to the first device in response to detecting that a network state of the first network is abnormal.
  • sending a network abnormality message to the first device includes: performing periodic detection of preset indicators on the first network to obtain detection results corresponding to the preset indicators, and the preset indicators include at least one of the following: port power, signal strength, bit error rate, latency, and packet loss rate; in response to detecting that the detection result corresponding to the preset indicator reaches a preset threshold, determining that the network status of the first network is abnormal, and sending a network abnormality message to the first device.
  • a unique serial number is added to each message, and the second device is used to verify whether the message is valid based on the serial number and store the valid message.
  • the switching unit 902 includes: an abnormal response module, which is used to suspend the use of the first network for transmitting business messages in response to receiving the network abnormality message, and notify the second device through the second network that the serial number of the business message transmitted on the first network has expired; a message sending module, which is used to add a valid serial number to the backup message, and use the second network to send the backup message with the valid serial number added to the second device.
  • the first device further includes: an information determination unit, configured to determine network information corresponding to the first network before transmitting the service message through the first network, the network information including the The device identification of the device included in the first network; wherein the network abnormality message sent by the network edge device to the first device includes the first device identification of the network edge device itself, and the response to receiving the network abnormality message, suspending the use of the first network for transmitting business messages includes: determining the first network including the network edge device based on the first device identification in the network abnormality message and the network information, and deleting the routing information corresponding to the first network passing through the network edge device from the routing table of the first device.
  • an information determination unit configured to determine network information corresponding to the first network before transmitting the service message through the first network, the network information including the The device identification of the device included in the first network
  • the network abnormality message sent by the network edge device to the first device includes the first device identification of the network edge device itself, and the response to receiving the network abnormality message, suspending the use of the first network for transmit
  • the second device is also used to send a detection message to the network edge device through the first network
  • the network edge device is also used to fill the first device identifier of the network edge device itself into the detection message in response to receiving the detection message, and send the detection message filled with the first device identifier to the first device
  • the information determination unit includes: a first recording module, which is used to extract the first device identifier from the detection message filled with the first device identifier in response to receiving the detection message filled with the first device identifier, and record the extracted first device identifier in the network information.
  • the information determination unit when the same network device is connected between the first device and the network edge device, includes: a second recording module, used to obtain a second device identifier of the network device and record the second device identifier in the network information; wherein the network abnormality message sent by the network edge device to the first device includes the first device identifier of the network edge device itself and the second device identifier of the network device, and the response to receiving the network abnormality message, suspending the use of the first network for transmitting business messages includes: determining a first network passing through the network edge device and the network device based on the first device identifier and the second device identifier in the network abnormality message, and the network information, and deleting the routing information corresponding to the first network passing through the network edge device and the network device from the routing table of the first device.
  • the first device also includes: a message transmission unit, which is used to obtain the business message to be transmitted when the network status of the first network is normal, back up the business message to be transmitted, obtain the backup message and store it, and add a serial number to the business message to be transmitted, and use the first network to send the business message with the added serial number to the second device.
  • a message transmission unit which is used to obtain the business message to be transmitted when the network status of the first network is normal, back up the business message to be transmitted, obtain the backup message and store it, and add a serial number to the business message to be transmitted, and use the first network to send the business message with the added serial number to the second device.
  • the network edge device is also used to: predict a time point when a network anomaly will occur in the first network in the future based on historical network anomaly messages of the first network, the historical network anomaly messages including network anomaly messages counted when network anomalies occurred in the first network within a historical time; and send a network anomaly message to the first device based on the time point.
  • messages are transmitted between the first device and the network edge device via an internal network.
  • a network status detection module is provided in the internal network. The network status detection module is used to detect the network status of the internal network, and in response to detecting that the network status of the internal network is abnormal, an internal network abnormal message is sent to the first device; the first device also includes: an internal network switching unit, which is used to switch to use a backup internal network to transmit messages with the network edge device in response to the internal network abnormal message.
  • An embodiment of the present application also provides a device, which can be used as a first device, and the first device and the second device transmit business messages through a first network
  • the first network includes a network edge device connected to the first device
  • the first device includes an interface and a processor
  • the interface communicates with the processor
  • the processor is used to: receive a network abnormality message from the network edge device, wherein the network abnormality message indicates that an abnormality has occurred in the network status of the first network; in response to receiving the network abnormality message, switch to using a second network to transmit a backup message to the second device;
  • the second network is a transmission network with a normal network status between the first device and the second device, and the backup message includes the corresponding message of the business message transmitted on the first network when an abnormality occurs in the first network Backup message.
  • the first device and the second device are storage devices, and the first network is an optical transmission network.
  • the network edge device is configured to send a network abnormality message to the first device in response to detecting that a network state of the first network is abnormal.
  • sending a network abnormality message to the first device includes: performing periodic detection of preset indicators on the first network to obtain detection results corresponding to the preset indicators, and the preset indicators include at least one of the following: port power, signal strength, bit error rate, latency, and packet loss rate; in response to detecting that the detection result corresponding to the preset indicator reaches a preset threshold, determining that the network status of the first network is abnormal, and sending a network abnormality message to the first device.
  • a unique serial number is added to each message, and the second device is used to verify whether the message is valid based on the serial number and store valid messages.
  • switching to using the second network to transmit backup messages to the second device includes: in response to receiving the network abnormality message, suspending the use of the first network for transmitting business messages, and notifying the second device through the second network that the serial number of the business message transmitted on the first network has expired; adding a valid serial number to the backup message, and using the second network to send the backup message to which the valid serial number has been added to the second device.
  • the processor is also used to determine network information corresponding to the first network before transmitting business messages through the first network, and the network information includes device identifications of devices included in the first network; wherein the network abnormality message sent by the network edge device to the first device includes the first device identification of the network edge device itself, and the response to receiving the network abnormality message, suspending the use of the first network for transmission of business messages includes: determining the first network including the network edge device based on the first device identification in the network abnormality message and the network information, and deleting the routing information corresponding to the first network passing through the network edge device from the routing table of the first device.
  • the second device is also used to send a detection message to the network edge device through the first network
  • the network edge device is also used to fill the first device identifier of the network edge device itself into the detection message in response to receiving the detection message, and send the detection message filled with the first device identifier to the first device; wherein, determining the network information corresponding to the first network includes: in response to receiving the detection message filled with the first device identifier, extracting the first device identifier from the detection message filled with the first device identifier, and recording the extracted first device identifier in the network information.
  • determining the network information corresponding to the first network includes: obtaining a second device identifier of the network device and recording the second device identifier in the network information; wherein the network abnormality message sent by the network edge device to the first device includes the first device identifier of the network edge device itself and the second device identifier of the network device, and in response to receiving the network abnormality message, suspending the use of the first network for transmitting business messages includes: determining a first network passing through the network edge device and the network device based on the first device identifier and the second device identifier in the network abnormality message and the network information, and deleting the routing information corresponding to the first network passing through the network edge device and the network device from the routing table of the first device.
  • the processor is further configured to, when the network status of the first network is normal, obtain the service message to be transmitted, back up the service message to be transmitted, obtain the backup message and store it, and add a sequence number to the service message to be transmitted, and use the first network to send the service message with the added sequence number to the backup message. Send to the second device.
  • the network edge device is also used to: predict a time point when a network anomaly will occur in the first network in the future based on historical network anomaly messages of the first network, the historical network anomaly messages including network anomaly messages counted when network anomalies occurred in the first network within a historical time; and send a network anomaly message to the first device based on the time point.
  • messages are transmitted between the first device and the network edge device via an internal network.
  • a network status detection module is provided in the internal network. The network status detection module is used to detect the network status of the internal network, and in response to detecting that the network status of the internal network is abnormal, an internal network abnormal message is sent to the first device; the processor is also used to switch to using a backup internal network to transmit messages with the network edge device in response to the internal network abnormal message.
  • the embodiment of the present application provides a device management system, including a first device and a network edge device, wherein the first device is used to execute the operation performed by the first device in the above-mentioned device management method, and the network edge device is used to execute the operation performed by the network edge device in the above-mentioned device management method.
  • the device management system can be applied to any data center (such as the first data center in Figure 2 above), and the above-mentioned disaster recovery system including two data centers can include the device management system.
  • An embodiment of the present application provides a non-volatile computer-readable storage medium on which computer program instructions are stored. When the computer program instructions are executed by a processor, the above method is implemented.
  • An embodiment of the present application provides a computer program product, including a computer-readable code, or a non-volatile computer-readable storage medium carrying the computer-readable code.
  • the computer-readable code runs in a processor of an electronic device, the processor in the electronic device executes the above method.
  • FIG10 shows a structural diagram of an electronic device 1300 according to an embodiment of the present application.
  • the electronic device 1300 may be a first device or a network edge device, and performs the functions required to be performed in the above-mentioned device management method.
  • the electronic device 1300 includes at least one processor 1801, at least one memory 1802, and at least one communication interface 1803.
  • the electronic device may also include common components such as an antenna, which will not be described in detail here.
  • Processor 1801 may be a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the above program.
  • Processor 1801 may include one or more processing units, for example: processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural-network processing unit (neural-network processing unit, NPU), etc.
  • different processing units may be independent devices or integrated in one or more processors.
  • Communication interface 1803 is used to communicate with other electronic devices or communication networks, such as Ethernet, radio access network (RAN), core network, wireless local area network (WLAN), etc.
  • RAN radio access network
  • WLAN wireless local area network
  • the memory 1802 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (RAM) or other types of dynamic storage devices that can store information and instructions, or an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disc storage, optical disc storage (including compact disc, laser disc, optical disc, digital versatile disc, Blu-ray disc, etc.)
  • the memory may be a computer program memory or a memory device, such as a disk, a disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store the desired program code in the form of instructions or data structures and can be accessed by a computer, but is not limited thereto.
  • the memory may be independent and connected to the processor through a bus.
  • the memory may also be integrated with the processor.
  • the memory 1802 is used to store application code for executing the above solution, and the execution is controlled by the processor 1801.
  • the processor 1801 is used to execute the application code stored in the memory 1802.
  • a computer readable storage medium may be a tangible device that can hold and store instructions used by an instruction execution device.
  • a computer readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Computer readable storage media include: a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disc (DVD), a memory stick, a floppy disk, a mechanical encoding device, such as a punch card or a raised structure in a groove on which instructions are stored, and any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or flash memory erasable programmable read-only memory
  • SRAM static random access memory
  • CD-ROM compact disc read-only memory
  • DVD digital versatile disc
  • memory stick a floppy disk
  • mechanical encoding device such as a punch card or a raised structure in a groove on which instructions are stored, and any suitable combination of the foregoing.
  • the computer-readable program instructions or codes described herein can be downloaded from a computer-readable storage medium to each computing/processing device, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network can include copper transmission cables, optical fiber transmissions, wireless transmissions, routers, firewalls, switches, gateway computers, and/or edge servers.
  • the network adapter card or network interface in each computing/processing device receives the computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in the computer-readable storage medium in each computing/processing device.
  • the computer program instructions for performing the operations of the present application may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages, including object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as "C" language or similar programming languages.
  • the computer-readable program instructions may be executed entirely on the user's computer, partially on the user's computer, as a separate software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server.
  • the remote computer may be connected to the user's computer via any type of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (e.g., via the Internet using an Internet service provider).
  • LAN local area network
  • WAN wide area network
  • an Internet service provider e.g., via the Internet using an Internet service provider.
  • an electronic circuit such as a programmable logic circuit, a field programmable gate array (FPGA) or a programmable logic array (PLA)
  • FPGA field programmable gate array
  • PLA programmable logic array
  • These computer-readable program instructions can be provided to a general-purpose computer, a special-purpose computer, or other programmable data processing device.
  • the instructions are then stored in a computer-readable storage medium, and the instructions cause the computer, programmable data processing device, and/or other device to work in a specific manner, so that the computer-readable medium storing the instructions includes an article of manufacture, which includes instructions for implementing various aspects of the functions/actions specified in one or more boxes in the flowchart and/or block diagram.
  • Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device so that a series of operating steps are performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions executed on the computer, other programmable data processing apparatus, or other device to implement the functions/actions specified in one or more boxes in the flowchart and/or block diagram.
  • each square frame in the flow chart or block diagram can represent a part of a module, program segment or instruction, and a part of the module, program segment or instruction includes one or more executable instructions for realizing the logical function of the specification.
  • the functions marked in the square frame can also occur in a sequence different from that marked in the accompanying drawings. For example, two continuous square frames can actually be executed substantially in parallel, and they can also be executed in the opposite order sometimes, depending on the functions involved.
  • each box in the block diagram and/or flowchart, and the combination of boxes in the block diagram and/or flowchart can be implemented by hardware (such as circuits or ASICs (Application Specific Integrated Circuit)) that performs the corresponding function or action, or can be implemented by a combination of hardware and software, such as firmware.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请涉及一种设备管理方法、设备、***和存储介质。该方法包括:第一设备接收来自网络边缘设备的网络异常报文,网络异常报文表示第一网络的网络状态发生异常;第一设备响应于接收到网络异常报文,切换为使用第二网络向第二设备传输备份报文;其中,第二网络为所述第一设备与所述第二设备之间网络状态正常的传输网络,备份报文包括第一网络发生异常时第一网络上传输的业务报文所对应的备份报文。根据本申请实施例,能够实现毫秒级的切换网络来传输报文,无需等待网络响应超时,降低网络异常对业务功能的影响时长。

Description

设备管理方法、设备、***和存储介质
本申请要求于2022年10月27日提交中国专利局、申请号为202211328073.4、申请名称为“设备管理方法、设备、***和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信技术领域,尤其涉及一种设备管理方法、设备、***和存储介质。
背景技术
目前的容灾***中通常包含两个数据中心,两个数据中心中的主数据中心产生的业务数据会复制到另一数据中心中,这样可以在主数据中心发生故障后,由另一数据中心接管业务。但由于现实空间中的两个数据中心之间常常相距数十公里距离,两个数据中心之间通过光缆进行网络连接,而由于基建施工、地质变更、网络改造等外部不可控因素,使得两个数据中心之间的连接存在不稳定性,容易发生网络异常影响两个数据中心之间的通信传输,进而影响上层的业务功能。因此,需要及时发现并隔离异常网络,以保证业务的稳定性。
相关技术中主要使用探测报文或者对异常报文进行时延、丢包率等指标的统计,并基于统计结果与统计阈值之间的关系来检测网络异常情况。而由于两个数据中心之间的通信网络通常会经过较多网络设备(如交换机、波分复用器等),为避免误判而触发的不必要的网络切换,统计阈值往往较大,这一定程度上导致了网络异常的检测精度不足,检测耗时较长,对业务功能产生影响的时长较长。
发明内容
有鉴于此,提出了一种设备管理方法、设备、***和存储介质。
第一方面,本申请的实施例提供了一种设备管理方法,第一设备与第二设备之间通过第一网络进行业务报文的传输,所述第一网络中包括与所述第一设备连接的网络边缘设备,所述方法包括:所述第一设备接收来自所述网络边缘设备的网络异常报文,所述网络异常报文表示所述第一网络的网络状态发生异常;所述第一设备响应于接收到所述网络异常报文,切换为使用第二网络向所述第二设备传输备份报文;其中,所述第二网络为所述第一设备与所述第二设备之间网络状态正常的传输网络,所述备份报文包括所述第一网络发生异常时所述第一网络上传输的业务报文所对应的备份报文。
在本申请实施例中,通过网络边缘设备能够高精度地快速检测第一网络的网络状态,且将表示网络状态发生异常的网络异常报文上报给第一设备,第一设备可以响应于网络边缘设备提供的网络异常报文,快速切换为使用网络状态正常的第二网络向第二设备发送备份报文,能够实现毫秒级的切换网络来传输报文,无需等待网络响应超时,降低网络异常对业务功能的影响时长。
可选地,所述第一设备和所述第二设备为存储设备,所述第一网络为光传输网络。
根据第一方面,在所述设备管理方法的第一种可能的实现方式中,所述方法还包括:所 述网络边缘设备响应于检测到所述第一网络的网络状态发生异常,向所述第一设备发送网络异常报文。
根据本申请实施例,通过网络边缘设备直接向第一设备发送网络异常报文,能够缩短网络异常检测时长,提高异常网络切换速度。
根据第一方面的第一种可能的实现方式,所述响应于检测到所述第一网络的网络状态发生异常,向所述第一设备发送网络异常报文,包括:所述网络边缘设备对所述第一网络进行预设指标的定时检测,得到所述预设指标对应的检测结果,所述预设指标包括以下至少一种:端口功率、信号强度、误码率、时延、丢包率;所述网络边缘设备响应于检测到所述预设指标对应的检测结果达到预设阈值,确定所述第一网络的网络状态发生异常,并向所述第一设备发送网络异常报文。
根据本申请实施例,能够利用网络边缘设备所具备的硬件感知网络状态的能力,提高网络状态的检测精度和速度。
根据第一方面,在所述设备管理方法的第二种可能的实现方式中,各报文上添加有唯一序列号,所述第二设备用于根据序列号校验报文是否有效并存储有效的报文,所述响应于接收到所述网络异常报文,切换为使用第二网络向所述第二设备传输备份报文,包括:所述第一设备响应于接收到所述网络异常报文,暂停使用所述第一网络进行业务报文的传输,并通过所述第二网络向所述第二设备通知所述第一网络上传输的业务报文的序列号已失效;所述第一设备为所述备份报文添加有效的序列号,并使用所述第二网络向所述第二设备发送已添加有效的序列号的备份报文。
根据本申请实施例,通过暂停使用第一网络进行业务报文的传输,实现第一网络的隔离降级,保证后续发送的备份报文或业务报文不再使用异常的第一网络,并且通知第二设备失效的序列号后,即可切换为使用第二网络发送备份报文,这样无需等待网络超时后才重新发送报文,且第二设备可以根据失效序列号的通知,直接忽略第一网络传输的无效的业务报文,避免重复存储相同报文,保持数据一致性,降低网络异常对业务功能的影响时长。
根据第一方面,在所述设备管理方法的第三种可能的实现方式中,在通过所述第一网络进行业务报文的传输之前,所述方法还包括:所述第一设备确定所述第一网络对应的网络信息,所述网络信息包括所述第一网络中所包含设备的设备标识;其中,所述网络边缘设备向所述第一设备发送的所述网络异常报文中包括所述网络边缘设备自身的第一设备标识,所述响应于接收到所述网络异常报文,暂停使用所述第一网络进行业务报文的传输,包括:所述第一设备根据所述网络异常报文中的第一设备标识以及所述网络信息,确定出包含所述网络边缘设备的第一网络,并从所述第一设备的路由表中删除经过所述网络边缘设备的第一网络所对应的路由信息。
根据本申请实施例,通过根据网络异常报文中的第一设备标识以及网络信息,可以快速地确定出包含网络边缘设备的第一网络,并通过从路由表中删除第一网络对应的网络信息,可以实现快速地网络隔离,降低网络异常对业务的影响。
根据第一方面或第一方面的第三种可能的实现方式,所述第二设备还用于通过所述第一网络向所述网络边缘设备发送探测报文,所述方法还包括:所述网络边缘设备响应于接收到所述探测报文,将所述网络边缘设备自身的第一设备标识填入所述探测报文中,并将已填入第一设备标识的探测报文发送至所述第一设备;其中,所述确定所述第一网络对应的网络信 息,包括:所述第一设备响应于接收到所述已填入第一设备标识的探测报文,从所述已填入第一设备标识的探测报文中提取出第一设备标识,并将提取出的第一设备标识记录到所述网络信息中。
根据本申请实施例,通过记录网络边缘设备的第一设备标识,可以知晓当前使用的第一网络所经过的网络边缘设备,以便于之后快速地基于第一设备标识确定发生异常的网络,实现快速地网络隔离。
根据第一方面或第一方面的第三种可能的实现方式,在所述第一设备与所述网络边缘设备之间连接同一网络设备的情况下,所述确定所述第一网络对应的网络信息,包括:所述第一设备获取所述网络设备的第二设备标识,并将所述第二设备标识记录到所述网络信息中;其中,所述网络边缘设备向所述第一设备发送的所述网络异常报文中包括所述网络边缘设备自身的第一设备标识以及所述网络设备的第二设备标识,所述响应于接收到所述网络异常报文,暂停使用所述第一网络进行业务报文的传输,包括:所述第一设备根据所述网络异常报文中的第一设备标识与第二设备标识,以及所述网络信息,确定经过所述网络边缘设备和所述网络设备的第一网络,并从所述第一设备的路由表中删除经过所述网络边缘设备和所述网络设备的第一网络所对应的路由信息。
根据本申请实施例,通过根据网络异常报文中的第一设备标识、第二设备标识以及网络信息,可以更快速准确地确定出包含网络边缘设备与网络设备的第一网络,并通过从路由表中删除第一网络对应的网络信息,可以实现快速地网络隔离,降低网络异常对业务的影响。
根据第一方面,在所述设备管理方法的第四种可能的实现方式中,在所述第一网络的网络状态正常的情况下,所述第一设备与所述第二设备之间业务报文的传输过程,包括:所述第一设备获取待传输的业务报文,对所述待传输的业务报文进行备份,得到备份报文并进行存储,以及,对所述待传输的业务报文添加序列号,并使用所述第一网络将已添加序列号的业务报文发送至所述第二设备。
根据本申请实施例,通过对业务报文进行备份来得到备份报文的处理,能够在网络异常时直接采用其它正常网络尝试发送备份报文,而无需等待网络响应超时,仍可维持两个存储设备之间的复制业务,并且通过为业务报文添加序列号,可以保证复制业务中的数据一致性。
根据第一方面,在所述设备管理方法的第五种可能的实现方式中,所述方法还包括:所述网络边缘设备基于所述第一网络的历史网络异常报文,预测所述第一网络未来发生网络异常的时间点,所述历史网络异常报文包括所述第一网络在历史时间内发生网络异常时所统计的网络异常报文;所述网络边缘设备基于所述时间点,向所述第一设备发送网络异常报文。
根据本申请实施例,能够基于预测的未来可能发生网络异常的时间点,能够缩短网络边缘设备对第一网络的网络状态的检测时长,提前向第一设备发送网络异常报文,从而显著降低网络异常对业务功能的影响。
根据第一方面,在所述设备管理方法的第六种可能的实现方式中,所述第一设备与所述网络边缘设备之间通过内部网络进行报文传输,所述内部网络中设有网络状态检测模块,所述网络状态检测模块用于检测所述内部网络的网络状态,所述方法还包括:所述网络状态检测模块响应于检测到所述内部网络的网络状态发生异常,向所述第一设备发送内部网络异常报文;所述第一设备响应于所述内部网络异常报文,切换使用备份内部网络与所述网络边缘设备进行报文传输。
根据本申请实施例,通过在网络状态模块检测出内部网络发生异常时,切换使用备份内部网络与网络边缘设备进行报文传输,提高了第一数据中心内部的灾备能力,减少了网络超时的等待时长,且保证了数据的一致性,降低内部网络异常对业务的影响。
第二方面,本申请的实施例提供了一种设备,所述设备作为第一设备,所述第一设备与第二设备之间通过第一网络进行业务报文的传输,所述第一网络中包括与所述第一设备连接的网络边缘设备;所述第一设备包括接收单元和切换单元;其中,所述接收单元,用于接收来自所述网络边缘设备的网络异常报文,所述网络异常报文表示所述第一网络的网络状态发生异常;所述切换单元,用于响应于接收到所述网络异常报文,切换为使用第二网络向所述第二设备传输备份报文;其中,所述第二网络为所述第一设备与所述第二设备之间网络状态正常的传输网络,所述备份报文包括所述第一网络发生异常时所述第一网络上传输的业务报文所对应的备份报文。
可选地,所述第一设备和所述第二设备为存储设备,所述第一网络为光传输网络。
根据第二方面,在所述设备的第一种可能的实现方式中,所述网络边缘设备用于响应于检测到所述第一网络的网络状态发生异常,向所述第一设备发送网络异常报文。
根据第二方面的第一种可能的实现方式,所述响应于检测到所述第一网络的网络状态发生异常,向所述第一设备发送网络异常报文,包括:对所述第一网络进行预设指标的定时检测,得到所述预设指标对应的检测结果,所述预设指标包括以下至少一种:端口功率、信号强度、误码率、时延、丢包率;响应于检测到所述预设指标对应的检测结果达到预设阈值,确定所述第一网络的网络状态发生异常,并向所述第一设备发送网络异常报文。
根据第二方面,在所述设备的第二种可能的实现方式中,各报文上添加有唯一序列号,所述第二设备用于根据序列号校验报文是否有效并存储有效的报文,所述切换单元,包括:异常响应模块,用于响应于接收到所述网络异常报文,暂停使用所述第一网络进行业务报文的传输,并通过所述第二网络向所述第二设备通知所述第一网络上传输的业务报文的序列号已失效;报文发送模块,用于为所述备份报文添加有效的序列号,并使用所述第二网络向所述第二设备发送已添加有效的序列号的备份报文。
根据第二方面,在所述设备的第三种可能的实现方式中,所述第一设备还包括:信息确定单元,用于在通过所述第一网络进行业务报文的传输之前,确定所述第一网络对应的网络信息,所述网络信息包括所述第一网络中所包含设备的设备标识;其中,所述网络边缘设备向所述第一设备发送的所述网络异常报文中包括所述网络边缘设备自身的第一设备标识,所述响应于接收到所述网络异常报文,暂停使用所述第一网络进行业务报文的传输,包括:根据所述网络异常报文中的第一设备标识以及所述网络信息,确定出包含所述网络边缘设备的第一网络,并从所述第一设备的路由表中删除经过所述网络边缘设备的第一网络所对应的路由信息。
根据第二方面或第二方面的第三种可能的实现方式,所述第二设备还用于通过所述第一网络向所述网络边缘设备发送探测报文,所述网络边缘设备还用于响应于接收到所述探测报文,将所述网络边缘设备自身的第一设备标识填入所述探测报文中,并将已填入第一设备标识的探测报文发送至所述第一设备;其中,所述信息确定单元,包括:第一记录模块,用于响应于接收到所述已填入第一设备标识的探测报文,从所述已填入第一设备标识的探测报文中提取出第一设备标识,并将提取出的第一设备标识记录到所述网络信息中。
根据第二方面或第二方面的第三种可能的实现方式,在所述第一设备与所述网络边缘设备之间连接同一网络设备的情况下,所述信息确定单元,包括:第二记录模块,用于获取所述网络设备的第二设备标识,并将所述第二设备标识记录到所述网络信息中;其中,所述网络边缘设备向所述第一设备发送的所述网络异常报文中包括所述网络边缘设备自身的第一设备标识以及所述网络设备的第二设备标识,所述响应于接收到所述网络异常报文,暂停使用所述第一网络进行业务报文的传输,包括:根据所述网络异常报文中的第一设备标识与第二设备标识,以及所述网络信息,确定经过所述网络边缘设备和所述网络设备的第一网络,并从所述第一设备的路由表中删除经过所述网络边缘设备和所述网络设备的第一网络所对应的路由信息。
根据第二方面,在所述设备的第四种可能的实现方式中,所述第一设备还包括:报文传输单元,用于在所述第一网络的网络状态正常的情况下,获取待传输的业务报文,对所述待传输的业务报文进行备份,得到备份报文并进行存储,以及,对所述待传输的业务报文添加序列号,并使用所述第一网络将已添加序列号的业务报文发送至所述第二设备。
根据第二方面,在所述设备的第五种可能的实现方式中,所述网络边缘设备还用于:基于所述第一网络的历史网络异常报文,预测所述第一网络未来发生网络异常的时间点,所述历史网络异常报文包括所述第一网络在历史时间内发生网络异常时所统计的网络异常报文;基于所述时间点,向所述第一设备发送网络异常报文。
根据第二方面,在所述设备的第六种可能的实现方式中,所述第一设备与所述网络边缘设备之间通过内部网络进行报文传输,所述内部网络中设有网络状态检测模块,所述网络状态检测模块用于检测所述内部网络的网络状态,并响应于检测到所述内部网络的网络状态发生异常,向所述第一设备发送内部网络异常报文;所述第一设备还包括:内部网络切换单元,用于响应于所述内部网络异常报文,切换使用备份内部网络与所述网络边缘设备进行报文传输。
第三方面,本申请的实施例提供了一种设备,所述设备作为第一设备,所述第一设备与第二设备之间通过第一网络进行业务报文的传输,所述第一网络中包括与所述第一设备连接的网络边缘设备;所述第一设备包括接口和处理器;所述接口与所述处理器通信;其中,所述处理器用于:接收来自所述网络边缘设备的网络异常报文,所述网络异常报文表示所述第一网络的网络状态发生异常;响应于接收到所述网络异常报文,切换为使用第二网络向所述第二设备传输备份报文;其中,所述第二网络为所述第一设备与所述第二设备之间网络状态正常的传输网络,所述备份报文包括所述第一网络发生异常时所述第一网络上传输的业务报文所对应的备份报文。
可选地,所述第一设备和所述第二设备为存储设备,所述第一网络为光传输网络。
根据第三方面,在所述设备的第一种可能的实现方式中,所述网络边缘设备用于响应于检测到所述第一网络的网络状态发生异常,向所述第一设备发送网络异常报文。
根据第三方面的第一种可能的实现方式,所述响应于检测到所述第一网络的网络状态发生异常,向所述第一设备发送网络异常报文,包括:对所述第一网络进行预设指标的定时检测,得到所述预设指标对应的检测结果,所述预设指标包括以下至少一种:端口功率、信号强度、误码率、时延、丢包率;响应于检测到所述预设指标对应的检测结果达到预设阈值,确定所述第一网络的网络状态发生异常,并向所述第一设备发送网络异常报文。
根据第三方面,在所述设备的第二种可能的实现方式中,各报文上添加有唯一序列号,所述第二设备用于根据序列号校验报文是否有效并存储有效的报文,所述响应于接收到所述网络异常报文,切换为使用第二网络向所述第二设备传输备份报文,包括:响应于接收到所述网络异常报文,暂停使用所述第一网络进行业务报文的传输,并通过所述第二网络向所述第二设备通知所述第一网络上传输的业务报文的序列号已失效;为所述备份报文添加有效的序列号,并使用所述第二网络向所述第二设备发送已添加有效的序列号的备份报文。
根据第三方面,在所述设备的第三种可能的实现方式中,所述处理器还用于在通过所述第一网络进行业务报文的传输之前,确定所述第一网络对应的网络信息,所述网络信息包括所述第一网络中所包含设备的设备标识;其中,所述网络边缘设备向所述第一设备发送的所述网络异常报文中包括所述网络边缘设备自身的第一设备标识,所述响应于接收到所述网络异常报文,暂停使用所述第一网络进行业务报文的传输,包括:根据所述网络异常报文中的第一设备标识以及所述网络信息,确定出包含所述网络边缘设备的第一网络,并从所述第一设备的路由表中删除经过所述网络边缘设备的第一网络所对应的路由信息。
根据第三方面或第三方面的第三种可能的实现方式,所述第二设备还用于通过所述第一网络向所述网络边缘设备发送探测报文,所述网络边缘设备还用于响应于接收到所述探测报文,将所述网络边缘设备自身的第一设备标识填入所述探测报文中,并将已填入第一设备标识的探测报文发送至所述第一设备;其中,所述确定所述第一网络对应的网络信息,包括:响应于接收到所述已填入第一设备标识的探测报文,从所述已填入第一设备标识的探测报文中提取出第一设备标识,并将提取出的第一设备标识记录到所述网络信息中。
根据第三方面或第三方面的第三种可能的实现方式,在所述第一设备与所述网络边缘设备之间连接同一网络设备的情况下,所述确定所述第一网络对应的网络信息,包括:获取所述网络设备的第二设备标识,并将所述第二设备标识记录到所述网络信息中;其中,所述网络边缘设备向所述第一设备发送的所述网络异常报文中包括所述网络边缘设备自身的第一设备标识以及所述网络设备的第二设备标识,所述响应于接收到所述网络异常报文,暂停使用所述第一网络进行业务报文的传输,包括:根据所述网络异常报文中的第一设备标识与第二设备标识,以及所述网络信息,确定经过所述网络边缘设备和所述网络设备的第一网络,并从所述第一设备的路由表中删除经过所述网络边缘设备和所述网络设备的第一网络所对应的路由信息。
根据第三方面,在所述设备的第四种可能的实现方式中,所述处理器还用于:在所述第一网络的网络状态正常的情况下,获取待传输的业务报文,对所述待传输的业务报文进行备份,得到备份报文并进行存储,以及,对所述待传输的业务报文添加序列号,并使用所述第一网络将已添加序列号的业务报文发送至所述第二设备。
根据第三方面,在所述设备的第五种可能的实现方式中,所述网络边缘设备还用于:基于所述第一网络的历史网络异常报文,预测所述第一网络未来发生网络异常的时间点,所述历史网络异常报文包括所述第一网络在历史时间内发生网络异常时所统计的网络异常报文;基于所述时间点,向所述第一设备发送网络异常报文。
根据第三方面,在所述设备的第六种可能的实现方式中,所述第一设备与所述网络边缘设备之间通过内部网络进行报文传输,所述内部网络中设有网络状态检测模块,所述网络状态检测模块用于检测所述内部网络的网络状态,响应于检测到所述内部网络的网络状态发生 异常,向所述第一设备发送内部网络异常报文;所述处理器还用于:响应于所述内部网络异常报文,切换使用备份内部网络与所述网络边缘设备进行报文传输。
第四方面,本申请的实施例提供了一种设备管理***,包括第一设备以及网络边缘设备,所述第一设备用于执行上述设备管理方法中由所述第一设备执行的操作,所述网络边缘设备用于执行上述设备管理方法中由所述网络边缘设备执行的操作。
第五方面,本申请的实施例提供了一种非易失性计算机可读存储介质,其上存储有计算机程序指令,计算机程序指令被处理器执行时实现上述第一方面或者第一方面的多种可能的实现方式中的一种或几种的设备管理方法。
第六方面,本申请的实施例提供了一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行上述第一方面或者第一方面的多种可能的实现方式中的一种或几种的设备管理方法。
本申请的这些和其他方面在以下(多个)实施例的描述中会更加简明易懂。
附图说明
包含在说明书中并且构成说明书的一部分的附图与说明书一起示出了本申请的示例性实施例、特征和方面,并且用于解释本申请的原理。
图1示出根据相关技术提供的容灾***的示意图。
图2示出根据本申请一实施例提供的一种容灾***的示意图。
图3示出根据本申请一实施例的报文传输过程的示意图。
图4示出根据本申请一实施例的设备管理方法的流程图。
图5示出根据本申请一实施例的设备注册以及网络异常上报过程的示意图。
图6示出根据本申请一实施例的一种容灾***的示意图。
图7示出根据本申请一实施例的网络切换过程的示意图。
图8示出根据本申请一实施例的网络信息确定过程的示意图。
图9示出根据本申请一实施例的一种第一设备的结构示意图。
图10示出根据本申请一实施例的电子设备1300的结构图。
具体实施方式
以下将参考附图详细说明本申请的各种示例性实施例、特征和方面。附图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施例的各种方面,但是除非特别指出,不必按比例绘制附图。
在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。
另外,为了更好的说明本申请,在下文的具体实施方式中给出了众多的具体细节。本领域技术人员应当理解,没有某些具体细节,本申请同样可以实施。在一些实例中,对于本领域技术人员熟知的方法、手段、元件和电路未作详细描述,以便于凸显本申请的主旨。
为了更好地理解本申请实施例的方案,下面先对本申请实施例可能涉及的相关术语和概念进行介绍。
(1)网络是指通过光纤通道(Fibre Channel,FC)、网际互连协议(Internet Protocol,IP)以及远程直接数据存取(Remote Direct Memory Access,RDMA)协议等构建的通信网络;两个数据中心的两个存储设备之间可以通过网络进行数据复制,以达到两个存储设备具有相同的数据副本。
(2)链路是指基于网络建立的由某数据中心(Data Certer,DC)的存储设备的一个网卡端口到另一数据中心的存储设备的一个网卡端口之间的一条逻辑链路(或称逻辑通道等),该链路可以具体实现两个存储设备之间进行数据复制时的报文传输。
(3)双活是指主机生产的业务数据同时写入两个数据中心的存储设备,任何一个数据中心的存储设备发生故障,另外一个数据中心的存储设备立即自动接管业务,这样可达到设备故障时不丢失数据,不中断业务。
(3)同步复制是指主机生产的业务数据同时写入两个数据中心的存储设备,主数据中心的主存储设备故障后,用户可手动配置灾备数据中心的灾备存储设备接管业务,这样可达到设备故障时不数据丢失,分钟级恢复业务。
(3)异步复制是指主机生产的业务数据只写入主数据中心的主存储设备,由后台周期性或手动触发将主存储设备中的业务数据同步到灾备数据中心的灾备存储设备,这样在主存储设备故障时,灾备存储设备具有最近的完整数据。
(4)网络异常检测是指存储设备定时对网络状态进行误码率、时延、丢包率等指标检测,当指标超过对应阈值时,则判断网络处于异常状态(也即亚健康状态)。
(5)网络隔离是指两个存储设备之间一般具有多个网络可用于报文传输,当其中一个网络的网络状态异常后,存储设备不再使用该异常的网络进行报文传输,并切换使用状态正常的网络进行报文传输。
(6)网络设备是指用来将各类服务器、计算机、应用终端等节点相互连接,构成信息通信网络的硬件设备。常见网络设备有:交换机、路由器、防火墙、网桥、集线器、网关、网络接口控制器(Network Interface Controller,NIC,又称网络接口卡)、光纤收发器等。
(7)网络边缘设备是指处于内网(也即内部网络)与外网(也即公共网络)之间的网络设备,或者说连接内网与外网的网络设备,通常包括交换机、波分复用器、网络接口卡等。
如上所述,目前的容灾***中通常包含两个数据中心,两个数据中心中的主数据中心产生的业务数据会复制到另一数据中心中,这样可以在主数据中心发生故障后,由另一数据中心接管业务。但由于现实空间中的两个数据中心之间常常相距数十公里距离,通常是通过租赁运营商的光缆进行网络连接,而由于基建施工、地质变更、网络改造等外部不可控因素,使得两个数据中心之间的连接存在不稳定性,容易发生网络异常影响两个数据中心之间的通信传输,进而影响上层的业务功能。因此,需要及时发现并隔离异常网络,以保证业务的稳定性。图1示出根据相关技术提供的容灾***的示意图,如图1所示,主数据中心与同城数据中心各自包含有网页服务器集群、应用服务器集群、数据库服务器集群以及存储设备,其中,网页服务器集群、应用服务器集群以及数据库服务器集群可以理解为生产业务数据的主机集群,该些主机集群会将生产的业务数据发送至存储设备中进行存储,两个数据中心的存储设备之间可以通过复制网络进行数据复制。
基于图1示出的容灾***,相关技术中网络异常检测方式主要包括两种:一种是两个数据中心的存储设备定期主动向对端发送探测报文,统计一定周期内探测报文的时延、丢包率 等指标,当这些指标达到一定指标阈值后,上报该网络异常,需要进行网络隔离;第二种是两个数据中心的存储设备各自统计网络上业务报文的时延、丢包率等指标,当这些指标达到一定指标阈值后,上报该网络异常,需要进行网络隔离。
但上述相关技术中的网络异常检测方式,依赖于两个存储设备对探测报文或异常业务报文的指标统计,而报文在复制网络的传输中,通常会经过接口卡、交换机、波分等网络设备,为避免误判而触发的不必要的网络切换,目前通常是统计一定周期内的时延和丢包率等,并在统计结果达到各指标对应的阈值时才能认定该网络发生异常并需要进行网络隔离;以及,目前的统计周期一般为分钟级,指标的阈值为历史数据的经验分析值,这使得网络异常检测需要分钟级的时长才能检测出异常的网络,同时,对于某些异常的网络,由于业务压力等方面的因素,可能使统计到的时延、丢包率等指标未达到指标阈值而不能检测出异常的网络,但该异常的网络可能已经对复制业务产生了影响。简而言之,上述相关技术中的网络异常检测方式的检测时间较长,精度较低。
基于图1示出的容灾***,相关技术中网络隔离方式主要是在某一存储设备检测到网络发生异常后,触发网络隔离,具体的,对于未进入复制网络的业务报文,在后续发送中,优先选择网络状态正常的网络,例如,可以在检测出网络发生异常后,更新网络的路由表,优先选择状态正常的网络;对于已经进入复制网络中进行传输的业务报文,等待网络超时后,再切换网络重试。
其中,对于已经在复制网络中进行传输的业务报文,由于存储设备不知道该业务报文的具体发送状态,只能依赖于网络协议的超时机制,等待网络超时上报后,再重新发送该业务报文,而超时机制中计算超时的时间单位往往是以10秒为单位,使得网络隔离处理中的切换网络重新发送业务报文所需的耗时较长;尤其是在数据库场景中,如果传输的是数据库日志数据的业务报文,由于数据库需要先响应报文传输成功的返回日志,再处理数据的一致性任务,若某一业务报文的长时间未返回传输成功的日志,会使得后续的其他业务报文也不能被及时处理。
综上,目前的网络异常检测方式以及网络隔离方式,在网络发生异常后,需要分钟级的时间才能消除异常网络对用户业务的影响,但是对于性能要求较高的业务场景,往往要求网络异常对业务的影响小于数秒,所以研究如何快速检测异常网络并隔离异常网络就非常重要。
有鉴于此,本申请实施例提供了一种设备管理方法,在本申请实施例的设备管理方法中,通过与存储设备连接的网络边缘设备感知网络状态,比如网络边缘设备可以对端口功率、信号强度、误码率,丢包率等进行芯片级监测,当网络异常后,立即向该数据中心的存储设备上报网络异常报文,上报网络故障的设备可以是交换机、波分复用器等网络边缘设备;存储设备在接收到网络边缘设备上报的网络异常报文后,可以立即对异常网络进行隔离(也即降级),以使后续发送的业务报文不再使用异常网络,并使用网络状态正常的备份网络向另一数据中心的存储设备通知失效的业务报文的序列号,失效的业务报文包括网络发生异常时正在异常网络上传输的报文,并立即使用失效的业务报文所对应的备份报文进行换网重试,也即切换使用备份网络向另一数据中心的存储设备发送备份报文,减少了网络超时的等待时长,且保证了数据的一致性,降低网络异常对业务的影响。
本申请实施例的设备管理方法可由网络边缘设备从硬件层次检测网络异常,网络边缘设备绕过中间网络设备直接向存储设备发送网络异常报文,存储设备可以快速隔离异常网络, 并对在异常网络中传输的业务报文立即使用备份报文进行换网重试。
本申请实施例的设备管理方法对业务的影响时长=网络边缘设备对网络状态检测的时长+存储设备换网重试的时长,仅有毫秒级的影响时长;而相关技术对业务的影响时长=网络异常检测的时长+异常网络切换的时长+异常网络的超时响应时长,具有分钟级的影响时长。显然,本申请的设备管理方法减少了网络超时的等待时长,降低了网络异常对业务的影响。
本申请实施例的设备管理方法可以应用于各种容灾***下两个数据中心之间的各种复制业务,包含不限于双活、同步复制、异步复制等复制业务;具体的,可应用于数据中心之间传输网络有偶发性网络异常、业务对网络性能的稳定性要求比较高等容灾场景中,能够快速检测出网络异常,实现毫秒级的网络切换,无需等待网络响应超时,降低网络异常对业务功能的影响。
在本申请实施例中,可以将发起跨设备传输数据并发送数据的一方称作源(source)端、接收数据的一方称作接收(s ink)端。需要说明的是,在一对关系中作为源端的设备,在另一对关系中也可能为接收端,也就是说,对于一个设备来说,其既可能是作为另一个设备的源端,也可能是另一个设备的接收端。
图2示出根据本申请一实施例提供的一种容灾***的示意图,下面通过图2示出的容灾***,对本申请实施例的设备管理方法进行介绍。如图2所示,该容灾***包括第一数据中心与第二数据中心,第一数据中心包括主机集群201a、网络边缘设备202a、网络设备(如交换机)203a以及第一设备204a;第二数据中心包括主机集群201b、网络边缘设备202b、网络设备203b以及第二设备204b;其中,第一设备204a与第二设备204b之间存在双活的数据复制业务;第一设备204a与第二设备204b可以为存储设备,当然也可以是计算设备(如计算机集群)等任意电子设备,对此本申请实施例不作限制。
其中,主机集群201a和201b各自用于生产业务数据,并各自将业务数据经过网络设备203a和203b传输至第一设备204a和第二设备204b以存储业务数据;作为源端的设备(第一设备204a或第二设备204b)可以将业务数据以业务报文的形式通过第一网络传输到接收端的设备,也即第一数据中心的第一设备204a与第二数据中心的第二设备204b之间通过第一网络进行业务报文的传输,以实现上述复制业务;其中,第一网络可以是光传输网络,当然也可以是电传输网络,对此本申请实施例不作限制。
其中,第一网络上包括与第一设备204a连接的网络边缘设备202a以及与第二设备204b连接的网络边缘设备202b。在一种可能实现方式中,第一设备204a与网络边缘设备202a之间可以存在以下两种通信方式:第一设备经过网络设备(如交换机)走内部网络与网络边缘设备进行连接,以进行业务报文的双向传输;第一设备与网络边缘设备通过网线直连,以直接接收网络边缘设备上报的网络异常报文。应理解,第二设备204b与网络边缘设备202b之间可以参照上述两种通信方式进行通信。
基于图2示出的容灾***,在一种可能的实现方式中,在第一网络的网络状态正常的情况下,第一设备204a与第二设备204b之间业务报文的传输过程,包括:第一设备204a获取待传输的业务报文,对待传输的业务报文进行备份,得到备份报文并进行存储,以及,对待传输的业务报文添加序列号,并使用第一网络将已添加序列号的业务报文发送至第二设备204b。其中,可以使用第一网络中可用的传输链路将业务报文发送第二设备204b。
图3示出根据本申请一实施例的报文传输过程的示意图,如图3所示,主机集群201a通 过网络设备203a向第一设备204a发送业务报文,第一设备204a对给业务报文进行备份并添加唯一序列号后,将添加序列号的业务报文经过网络边缘设备202a以及第二数据中心的网络边缘设备202b后传输至第二设备204b,第二设备204b对该业务报文的序列号进行校验,并在校验成功后对该业务报文进行本地缓存,之后可以对缓存的业务报文进行解析,以获取解析出的业务数据并进行存储。
其中,主机集群201a可以将业务数据转换为业务报文并发送至第一设备204a,第一设备204a响应于接收到主机集群201a发送的业务报文,对该业务报文进行备份,得到备份报文。应理解的是,本申请实施例对于业务数据的转换方式以及业务报文的备份方式等均不作限制。
其中,第一设备204a在接收到主机集群201a发送的业务报文后,还可以对该业务报文添加唯一序列号,并将添加序列号的业务报文发送至第二设备204b,这样第二设备204b在接收到已添加序列号的业务报文后,可以根据序列号校验业务报文是否有效并存储有效的业务报文,以保证两个存储设备之间数据的一致性。
根据本申请实施例,通过对业务报文进行备份来得到备份报文的处理,能够在网络异常时直接采用其它正常网络发送备份报文,而无需等待网络响应超时,这样不仅可维持两个存储设备之间的复制业务,还可以降低网络异常对业务功能的影响,并且通过为业务报文添加序列号,可以保证两个存储设备之间的数据一致性。
为便于理解本申请实施例的设备管理方法的示例性实现方式,下面基于图2示出的容灾***,并以第一数据中心为主体介绍本申请实施例的设备管理方法,如图4所示,该设备管理方法包括:
步骤S401,第一设备204a接收来自网络边缘设备202a的网络异常报文,网络异常报文表示第一网络的网络状态发生异常。
其中,网络边缘设备202a自身可以对第一网络(或第一网络下的传输链路)对应的端口功率、信号强度、误码率、丢包率(也称丢帧率)等预设指标进行检测,也即网络边缘设备自身具有对网络状态的硬件感知能力,能够更精准地感知经过该网络边缘设备的第一网络的网络状态,这样可以在检测到第一网络的网络状态发生异常的情况下,直接向第一设备204a上报网络异常报文,从而提高网络状态的检测精度和速度,降低网络异常对业务功能的影响。
应理解,第一设备204a在接收到网络异常报文后,可以对该网络异常报文进行解码,以得到网络异常报文中包含的可读信息,例如网络边缘设备的第一设备标识、第一网络的网络异常类型等。
可选地,该网络边缘设备202a的第一设备标识,例如可以是网络边缘设备202a的全球唯一标识(World Wide Number,WWN);其中,第一设备204a可以基于网络异常报文中的第一设备标识,对上报网络异常报文的网络边缘设备进行安全性校验和权限校验,以保证网络异常报文的来源可信。应理解的是,本申请实施例对于上述安全性校验和权限校验的具体校验方式不作限制,例如可以通过证书授权中心(Cercificate Authority,CA)校验该第一设备标识对应的网络边缘设备204a的证书是否在有效期内,来实现安全性校验;以及,还可以校验该第一设备标识是否在第一设备204a本地的授权白名单中,来实现权限校验等。
其中,网络边缘设备202a在向第一设备204a上报的网络异常报文之前,可以先进行设备注册,以被允许向第一设备204a上报网络异常报文。图5示出根据本申请一实施例的设备注册以及网络异常上报过程的示意图,如图5所示,网络边缘设备202a可以向第一设备204a 发送注册设备信息,该注册设备信息中携带证书以及网络边缘设备202a的第一设备标识,第一设备204a中的故障感知管理模块在接收到该设备信息后,可以将设备信息中的证书发送至CA中心以校验证书的有效性;故障感知管理模块可以同时将设备信息中的第一设备标识发送至第一设备204a的网络管理模块,该网络管理模块可以基于第一设备标识对网络边缘设备202a进行权限校验;在证书校验和权限校验通过后,确定网络边缘设备202a注册成功。
如图5所示,在网络边缘设备202a注册成功,且网络边缘设备202a检测到第一网络的网络状态发生异常的情况下,网络边缘设备202a可以向第一设备204a发送网络异常报文,第一设备204a中的故障感知管理模块在接收到该网络异常报文后,从该网络异常报文中提取出第一设备标识,并生成表征第一网络发生异常的网络异常事件上报给网络管理模块,该网络管理模块在接收到该网络异常事件后,执行切换网络发送备份报文并召回在第一网络中传输的业务报文的操作,也即执行步骤S402中切换为使用第二网络向第二设备204b传输备份报文;其中,召回第一网络中传输的业务报文可以避免与备份报文重复的业务报文发送至第二设备204b,有利于保持数据一致性。
步骤S402,第一设备204a响应于接收到网络异常报文,切换为使用第二网络向第二设备204b传输备份报文;其中,第二网络为第一设备204a与第二设备204b之间网络状态正常的传输网络,备份报文包括第一网络发生异常时第一网络上传输的业务报文所对应的备份报文。
应理解的是,第一设备204a与第二设备204b之间通常存在多个能进行报文传输的网络,以提高两个数据中心的容灾能力,若第一设备204a与第二设备204b之间的第一网络的网络状态发生异常,第一设备204a可以切换为使用其它网络状态正常的传输网络,也即切换为使用第二网络向第二设备204b传输备份报文。其中,第二网络也可以称为备份网络,第二网络下的传输链路可以称为备份链路。
如上所述,在第一网络的网络状态正常的情况下,第一设备204a与第二设备204b之间业务报文的传输过程中,第一设备204a会对待传输的业务报文进行备份,得到备份报文并进行存储;第二设备204b在成功接收到业务报文后,通常会向第一设备204a反馈成功接收的消息,第一设备204a便可以知晓那些业务报文已成功发送至第二设备204b,或者说那些序列号下的业务报文已成功复制到第二设备204b。
在通过第一网络进行报文传输过程中,若第一网络的网络状态突然发生异常,那么此时第一网络上正在传输的业务报文可能无法成功传输至第二设备204b,或者无法完整地传输至第二设备204b,此时第一设备204a可以直接切换为使用第二网络向第二设备204b传输备份报文,也即可以切换为使用第二网络下的某条传输链路向第二设备204b传输备份报文,该备份报文可以包括第一网络发生异常时第一网络上传输的业务报文所对应的备份报文。
图6示出根据本申请一实施例的一种容灾***的示意图,如图6所示,主机上的1和2、交换机上0至11以及存储设备“0A、0B、0C、0D”上的P0至P3均代表端口,“a、b、c、d”代表网络边缘设备;其中,端口之间的连线可以代表网络下具体的传输链路;图6中第一数据中心与第二数据中心之间经过存在两个网络,包括:经过网络边缘设备a和b的网络以及经过网络边缘设备c和d的网络;若网络边缘设备c检测到经过网络边缘设备c和d的网络发生网络异常,则该经过网络边缘设备c和d的网络可以是第一网络,那么经过网络边缘设备a和b的网络可以是第二网络,也即可以切换为使用经过网络边缘设备a和b的网络传输备份报文;相应的,若网络边缘设备a检测到经过网络边缘设备a和b的网络发生网络异常, 则该经过网络边缘设备a和b的网络可以是第一网络,那么经过网络边缘设备c和d的网络可以是第二网络。
在本申请实施例中,通过网络边缘设备能够高精度地快速检测第一网络的网络状态,将表示网络状态发生异常的网络异常报文上报给第一设备,第一设备可以响应于网络边缘设备提供的网络异常报文,快速切换为使用网络状态正常的第二网络向第二设备发送备份报文,能够实现毫秒级的切换网络来传输报文,无需等待网络响应超时,降低网络异常对业务功能的影响时长。
如上所述,网络边缘设备202a可以检测第一网络的网络状态,并在检测到第一网络的网络状态发生异常的情况下,向第一设备204a发送网络异常报文,在一种可能的实现方式中,在步骤S401之前,所述方法还包括:
网络边缘设备202a响应于检测到第一网络的网络状态发生异常,向第一设备204a发送网络异常报文。在该方式中,通过网络边缘设备直接向第一设备发送网络异常报文,能够缩短网络异常检测时长,提高异常网络切换速度。
可选地,响应于检测到第一网络的网络状态发生异常,向第一设备发送网络异常报文,包括:网络边缘设备202a对第一网络进行预设指标的定时检测,得到预设指标对应的检测结果,预设指标包括以下至少一种:端口功率、信号强度、误码率、时延、丢包率;网络边缘设备202a响应于检测到预设指标对应的检测结果达到预设阈值,确定第一网络的网络状态发生异常,并向第一设备204a发送网络异常报文。
其中,对第一网络进行预设指标的定时检测,可以理解为,按照一定的定时周期对第一网络进行预设指标的检测,例如,按照每2分钟一次的定时周期检测端口功率等。应理解的是,网络边缘设备202a自身具备上述预设指标的检测能力,本申请实施例对于网络边缘设备的检测方式不作限制。
其中,预设阈值可以是根据历史经验自定义设置的经验值,每一种预设指标可以对应各自的预设阈值。可选地,若网络边缘设备202a仅检测一种预设指标,可以在检测到该一种预设指标的检测结果达到预设阈值时,确定第一网络的网络状态发生异常;若网络边缘设备202a检测多种预设指标,可以在检测到多种预设指标的检测结果均达到预设阈值,确定第一网络的网络状态发生异常,或者也可以在检测到该多种预设指标中存在任意预设指标的检测结果达到预设阈值时,确定第一网络的网络状态发生异常,对此本申请实施例不作限制。
其中,网络边缘设备202a在向第一设备204a发送网络异常报文之前,可以向网络异常报文中写入自身的第一设备标识,以便于第一设备204a知晓网络异常报文的来源。
根据本申请实施例,能够利用网络边缘设备所具备的硬件感知网络状态的能力,提高网络状态的检测精度和速度。
考虑到,通过上述网络边缘设备202a实时检测第一网络的网络状态,仍需要耗费一些检测时长,为了进一步缩短网络异常对业务功能的影响,可以对一些历史网络异常报文进行统计分析,也即分析第一网络发生异常的规律,这样可以基于分析结果预测未来可能发生网络异常的时间点,并基于预测的时间点提前向第一设备204a发送网络异常报文,在一种可能的实现方式中,所述方法还包括:
网络边缘设备202a基于第一网络的历史网络异常报文,预测第一网络未来发生网络异常的时间点,历史网络异常报文包括第一网络在历史时间内发生网络异常时所统计的网络异常 报文;
网络边缘设备202a基于时间点,向第一设备204a发送网络异常报文。
其中,例如可以通过深度学习模型或机器学习模型等实现基于第一网络的历史网络异常报文,预测第一网络未来发生网络异常的时间点;当然也可以基于统计学模型对第一网络的历史网络异常报文进行建模,并基于构建的模型预测第一网络未来发生网络异常的时间点,对此本申请实施例不作限制。
其中,网络边缘设备202a基于时间点,向第一设备204a发送网络异常报文,可以包括:在当前时间达到该时间点之前的某个时间点时,向第一设备204a发送网络异常报文,例如,若预测的时间点是9月30日的7点,可以在当前时间为9月30日的6点59分时,向第一设备204a发送网络异常报文;当然也可以是在当前时间达到该时间点时,向第一设备204a发送网络异常报文,例如,若预测的时间点是9月30日的7点,则可以在当前时间为9月30日的7点时,向第一设备204a发送网络异常报文等。
根据本申请实施例,能够基于预测的未来可能发生网络异常的时间点,能够缩短网络边缘设备对第一网络的网络状态的检测时长,提前向第一设备发送网络异常报文,从而显著降低网络异常对业务功能的影响。
如上所述,第一设备204a向第二设备204b发送的业务报文添加有唯一序列号,也即各报文上添加有唯一序列号,第二设备204b可以用于根据序列号校验报文是否有效并存储有效的报文,在一种可能的实现方式中,在步骤S402中,响应于接收到网络异常报文,切换为使用第二网络下的第二网络向第二设备传输备份报文,包括:
步骤S4021,第一设备204a响应于接收到网络异常报文,暂停使用第一网络进行业务报文的传输,并通过第二网络向第二设备204b通知第一网络上传输的业务报文的序列号已失效;
步骤S4022,第一设备204a为备份报文添加有效的序列号,并使用第二网络向第二设备发送已添加有效的序列号的备份报文。
可选地,在步骤S4021中,暂停使用第一网络进行业务报文的传输,例如可以包括从第一设备的路由表中删除第一网络所对应的路由信息,路由信息例如可以包括第一网络上的网络设备的IP地址、子网掩码、网关等信息,这样第一设备204a无法再使用第一网络进行报文传输,从而实现第一网络的网络隔离(或称降级)。
可选地,在步骤S4021中,通过第二网络向第二设备204b通知第一网络上传输的业务报文的序列号已失效,例如可以向第二设备204b发送第一网络上传输的业务报文的序列号,来通知第二设备204b具有该些序列号的业务报文是失效的报文;由于报文的序列号通常是递增的,因此还可以向第二设备204b发送一个大于第一网络上传输的业务报文的序列号的序列号阈值,用于向第二设备204b通知处于该序列号阈值以下的序列号是无效的序列号,这样第二设备204b可以基于该序列号阈值进行有效性校验。
应理解的是,在步骤S4022中,第一设备204a为备份报文添加的序列号,应与第一网络上传输的业务报文的序列号不同,例如可以是大于上述序列号阈值的序列号,以便于第二设备204b基于有效的序列号存储备份报文,忽略序列号无效的业务报文。这样可以在第二设备204b已存储备份报文,而第一网络上传输的业务报文在网络恢复正常后又传输到第二设备204b中的情况发生时,避免第二设备204b重复存储与备份报文内容相同的业务报文,有利于保证两个存储设备之间的数据一致性。
图7示出根据本申请一实施例的网络切换过程的示意图,如图7所示,网络边缘设备202a响应于检测到第一网络的网络状态发生异常,向第一设备204a发送网络异常报文;第一设备204a根据网络异常报文中的第一设备标识以及网络信息,从路由表中删除经过网络边缘设备202a的第一网络所对应的路由信息;第一设备204a通过第二网络向第二设备204b通知第一网络上传输的业务报文的序列号已失效;第一设备204a使用第二网络向第二设备发送已添加有效的序列号的备份报文;若第二设备204b接收到发生异常的第一网络上传输的业务报文,则对该业务报文的序列号进行校验,并在序列号校验失败的情况下,向第一设备204a返回校验失败的消息,以保证数据一致性。
根据本申请实施例,通过暂停使用第一网络进行业务报文的传输,实现第一网络的隔离降级,保证后续发送的备份报文或业务报文不再使用异常的第一网络,并且通知第二设备失效的序列号后,即可切换为使用第二网络发送备份报文,这样无需等待网络超时后才重新发送报文,且第二设备可以根据失效序列号的通知,直接忽略第一网络传输的无效的业务报文,避免重复存储相同报文,保持数据一致性,降低网络异常对业务功能的影响时长。
在一种可能的实现方式中,在通过第一网络进行业务报文的传输之前,所述方法还包括:
第一设备204a向第二设备204b发送链路建立请求,链路建立请求用于基于第一网络,建立第一设备204a与第二设备204b之间的传输链路,传输链路用于在第一设备204a与第二设备204b之间进行报文传输;第一设备204a确定第一网络对应的网络信息,网络信息包括第一网络中所包含设备的设备标识。
其中,第一设备204a可以通过第一网络向第二设备204b发送链路建立请求,该链路建立请求在经过第一网络中的各个网络设备(包括网络边缘设备)时,各个网络设备会基于该链路建立请求,自动分配输入输出(IO,Input-output)的端口号,从而建立起第一设备204a与第二设备204b之间的传输链路,以具体进行报文传输。应理解,本申请实施例对于两个设备之间的链路建立方式不作限制。
其中,确定第一网络对应的网络信息,例如可以包括:在上述传输链路建立过程中,记录第一网络上与第一设备204a连接的网络边缘设备202a的第一设备标识,和/或,记录第一网络上的第一设备204a与网络边缘设备202a之间的网络设备203a的第二设备标识。
根据本申请实施例,通过记录第一网络对应的网络信息,可以便于之后利用网络信息快速确定出发生网络异常的网络并对异常的网络进行隔离降级。
考虑到,存储设备可能并不与网络边缘设备直接连接,例如图2示出的容灾***中第一设备204a与网络边缘设备204a之间连接有网络设备203a,在一种可能的实现方式中,在第一设备204a与网络边缘设备204a之间连接同一网络设备的情况下,确定第一网络对应的网络信息,包括:
第一设备204a获取网络设备的第二设备标识,并将第二设备标识记录到网络信息中。通过该方式,可以有效记录第一设备到网络边缘设备之间经过的同一网络设备的第二设备标识,以便于之后基于第二设备标识确定发生异常的网络。
如上所述,第一设备204a会通过向第一网络发送链路建立请求来建立传输链路,在该链路建立请求经过第一设备204a与网络边缘设备204a之间的网络设备(例如交换机)后,第一设备204a可以向该网络设备发送设备标识的查询请求,以获取该网络设备的第二设备标识,然后将基于查询请求获取到的第二设备标识记录到上述网络信息中;或者用户也可以手动查 询出该网络设备的第二设备标识并导入至第一设备204a中,以将第二设备标识记录到上述网络信息中,对此本申请实施例不作限制。
在一种可能的实现方式中,在第一设备204a与网络边缘设备204a之间经过多个网络设备的情况下,第二设备204b还用于通过第一网络向网络边缘设备发送探测报文,可选地,第二设备204b可以在上述传输链路建立完成后,通过第一网络向网络边缘设备发送探测报文,所述方法还包括:
网络边缘设备202a响应于接收到探测报文,将网络边缘设备202a自身的第一设备标识填入探测报文中,并将已填入第一设备标识的探测报文发送至第一设备204a;
其中,确定第一网络对应的网络信息,包括:第一设备204a响应于接收到已填入第一设备标识的探测报文,从已填入第一设备标识的探测报文中提取出第一设备标识,并将提取出的第一设备标识记录到网络信息中。
应理解,对于第一设备与网络边缘设备之间经过多个网络设备的场景,查询并记录网络设备的第二设备标识的过程可能较为繁琐,因此可以仅记录与第一设备连接的网络边缘设备的第一设备标识;网络边缘设备的第一设备标识的获取方式可以是在上述传输链路建立成功后,两个存储设备向对端发送探测报文,对端的网络边缘设备接收到该探测报文后,可以向探测报文中填入自身的设备标识,并将填入自身设备标识的探测报文发送至连接的存储设备,以便于存储设备提取出网络边缘设备的设备标识并存储到网络信息中。
图8示出根据本申请一实施例的网络信息确定过程的示意图,如图8所示,第一数据中心的第一设备204a可以基于第一网络向第二数据中心的第二设备204b发送链路建立请求,以建立第一设备204a与第二设备204b之间的传输链路,该传输链路中经过与第一设备204a连接的网络边缘设备202a以及与第二设备204b连接的网络边缘设备202b;在传输链路建立完成后,第一设备204a可以向第二设备204b发送探测报文,该探测网络用于探测对端网络边缘设备202b的设备标识,网络边缘设备202b在接收到该探测报文后,可以向该探测报文中填入自身的WWN,并将填入WWN的探测报文发送的第二设备204b,第二设备204b可以从该探测报文中提取出网络边缘设备202b的WWN,并将该WWN记录到本地的网络信息中;相应的,第二设备204b可以向第一设备204a发送探测报文,该探测网络用于探测对端网络边缘设备202a的设备标识,网络边缘设备202a在接收到该探测报文后,可以向该探测报文中填入自身的WWN,并将填入WWN的探测报文发送的第一设备204a,第一设备204a可以从该探测报文中提取出网络边缘设备202a的WWN,并将该WWN记录到本地的网络信息中。
根据本申请实施例,通过记录网络边缘设备的第一设备标识,可以知晓当前使用的第一网络所经过的网络边缘设备,以便于之后快速地基于第一设备标识确定发生异常的网络,实现快速地网络隔离。
如上所述,网络边缘设备向第一设备发送的网络异常报文中可以包括网络边缘设备自身的第一设备标识,基于上述网络信息,在步骤S4021中,响应于接收到网络异常报文,暂停使用第一网络进行业务报文的传输,包括:
第一设备204a根据网络异常报文中的第一设备标识以及网络信息,确定出包含网络边缘设备202a的第一网络,并从第一设备的路由表中删除经过网络边缘设备202a的第一网络所对应的路由信息。
如上所述,网络信息包括第一网络中所包含设备的设备标识,也即包含第一网络中经过 的网络边缘设备的第一设备标识,因此可以根据网络异常报文中的第一设备标识,遍历网络信息,从而确定出包含网络边缘设备202a的第一网络。
应理解的是,从第一设备的路由表中删除第一网络所对应的路由信息后,由于路由信息例如可以包括第一网络上的网络设备的IP地址、子网掩码、网关等信息,这样第一设备204a便无法使用第一网络进行报文传输,从而实现第一网络的网络隔离(或称降级)。
根据本申请实施例,通过根据网络异常报文中的第一设备标识以及网络信息,可以快速地确定出包含网络边缘设备的第一网络,并通过从路由表中删除第一网络对应的路由信息,可以实现快速地网络隔离,降低网络异常对业务的影响。
如上所述,网络信息中还可以记录有第一设备204a与网络边缘设备202a之间连接的同一网络设备的第二设备标识,在一种可能的实现方式中,网络边缘设备202a向第一设备204a发送的网络异常报文中包括网络边缘设备202a自身的第一设备标识以及网络设备的第二设备标识,在步骤S4021中,响应于接收到网络异常报文,暂停使用第一网络进行业务报文的传输,包括:
第一设备204a根据网络异常报文中的第一设备标识与第二设备标识,以及网络信息,确定经过网络边缘设备202a和网络设备的第一网络,并从第一设备204a的路由表中删除经过网络边缘设备202a和网络设备的第一网络所对应的路由信息。
如上所述,网络信息包括第一网络中所包含设备的设备标识,也即可以包含第一网络中经过的网络边缘设备202a的第一设备标识以及网络设备的第二设备标识,因此可以根据网络异常报文中的第一设备标识以及第二设备标识,遍历网络信息,从而确定出包含网络边缘设备202a与网络设备的第一网络。
根据本申请实施例,通过根据网络异常报文中的第一设备标识、第二设备标识以及网络信息,可以更快速准确地确定出包含网络边缘设备与网络设备的第一网络,并通过从路由表中删除第一网络对应的网络信息,可以实现快速地网络隔离,降低网络异常对业务的影响。
如上所述,第一设备204a可以经过网络设备(如交换机)走内部网络与网络边缘设备202a进行连接,也即,第一设备204a与网络边缘设备202a之间可以通过第一数据中心的内部网络进行报文传输,在一种可能的实现方式中,该内部网络中可以设有网络状态检测模块,该网络状态检测模块可以用于检测内部网络的网络状态,所述方法还包括:
网络状态检测模块响应于检测到内部网络的网络状态发生异常,向第一设备204a发送内部网络异常报文;
第一设备204a响应于内部网络异常报文,切换使用备份内部网络与网络边缘设备进行报文传输。
其中,网络状态检测模块可以通过可以对内部网络的端口功率、信号强度、误码率、时延、丢包率中的至少一种预设指标进行定时检测,实现内部网络的状态检测,当检测到至少一种预设指标达到预设阈值时,确定内部网络的网络状态发生异常,并向第一设备204a发送内部网络异常报文。
应理解的是,第一设备与网络边缘设备之间可以存在多个内部网络,以提高第一数据中自身的容灾能力;第一设备204a响应于内部网络异常报文,切换使用备份内部网络与网络边缘设备进行报文传输,例如可以包括:第一设备204a暂停使用内部网络进行业务报文的传输,并通过备份内部网络向网络边缘设备202a发送内部网络上传输的业务报文的序列号已失效 的通知,网络边缘设备202a在接收到该通知后,会将该通知转发至第二设备204b;以及,第一设备204a会通过备份内部网络向网络边缘设备202a发送添加有效序列号的备份报文,该备份报文可以是网络发生异常的内部网络上传输的业务报文所对应的备份报文,网络边缘设备202a在接收到该备份报文后,可以将该备份报文转发至第二设备204b。这样第二设备204b可以基于备份报文的序列号进行校验备份报文是否有效,并有效的存储有效的备份报文,忽略无效的业务报文。
根据本申请实施例,通过在网络状态模块检测出第一数据中心的内部网络发生异常时,切换使用备份内部网络与网络边缘设备进行报文传输,提高了第一数据中心内部的灾备能力,减少了网络超时的等待时长,且保证了数据的一致性,降低内部网络异常对业务的影响。
需说明的是,上述本申请实施例是以第一数据中心为主体进行的说明,本领域技术人员应明白,本申请实施例的设备管理方法同样也适应于第二数据中心,也即第二数据中心可以参考上述本申请实施例的设备管理方法的具体实现方式,降低网络异常对第二数据中心的业务影响,提高第二数据中心的灾备能力。以及,上述图2和图6示出的容灾***是本申请实施例提供的一些可能的实现方式,并不代表申请实施例的全部实现方式,应理解,容灾***中至少包单侧数据中心中的存储设备与网络边缘设备。
本申请实施例还提供了一种设备,该设备可以作为第一设备,该第一设备与第二设备之间通过第一网络进行业务报文的传输,所述第一网络中包括与所述第一设备连接的网络边缘设备;如图9所示,第一设备包括接收单元901和切换单元902;其中,所述接收单元901,用于接收来自所述网络边缘设备的网络异常报文,所述网络异常报文表示所述第一网络的网络状态发生异常;所述切换单元902,用于响应于接收到所述网络异常报文,切换为使用第二网络向所述第二设备传输备份报文;其中,所述第二网络为所述第一设备与所述第二设备之间网络状态正常的传输网络,所述备份报文包括所述第一网络发生异常时所述第一网络上传输的业务报文所对应的备份报文。
可选地,所述第一设备和所述第二设备为存储设备,所述第一网络为光传输网络。
在一种可能的实现方式中,所述网络边缘设备用于响应于检测到所述第一网络的网络状态发生异常,向所述第一设备发送网络异常报文。
可选地,所述响应于检测到所述第一网络的网络状态发生异常,向所述第一设备发送网络异常报文,包括:对所述第一网络进行预设指标的定时检测,得到所述预设指标对应的检测结果,所述预设指标包括以下至少一种:端口功率、信号强度、误码率、时延、丢包率;响应于检测到所述预设指标对应的检测结果达到预设阈值,确定所述第一网络的网络状态发生异常,并向所述第一设备发送网络异常报文。
在一种可能的实现方式中,各报文上添加有唯一序列号,所述第二设备用于根据序列号校验报文是否有效并存储有效的报文,所述切换单元902,包括:异常响应模块,用于响应于接收到所述网络异常报文,暂停使用所述第一网络进行业务报文的传输,并通过所述第二网络向所述第二设备通知所述第一网络上传输的业务报文的序列号已失效;报文发送模块,用于为所述备份报文添加有效的序列号,并使用所述第二网络向所述第二设备发送已添加有效的序列号的备份报文。
在一种可能的实现方式中,所述第一设备还包括:信息确定单元,用于在通过所述第一网络进行业务报文的传输之前,确定所述第一网络对应的网络信息,所述网络信息包括所述 第一网络中所包含设备的设备标识;其中,所述网络边缘设备向所述第一设备发送的所述网络异常报文中包括所述网络边缘设备自身的第一设备标识,所述响应于接收到所述网络异常报文,暂停使用所述第一网络进行业务报文的传输,包括:根据所述网络异常报文中的第一设备标识以及所述网络信息,确定出包含所述网络边缘设备的第一网络,并从所述第一设备的路由表中删除经过所述网络边缘设备的第一网络所对应的路由信息。
在一种可能的实现方式中,所述第二设备还用于通过所述第一网络向所述网络边缘设备发送探测报文,所述网络边缘设备还用于响应于接收到所述探测报文,将所述网络边缘设备自身的第一设备标识填入所述探测报文中,并将已填入第一设备标识的探测报文发送至所述第一设备;其中,所述信息确定单元,包括:第一记录模块,用于响应于接收到所述已填入第一设备标识的探测报文,从所述已填入第一设备标识的探测报文中提取出第一设备标识,并将提取出的第一设备标识记录到所述网络信息中。
在一种可能的实现方式中,在所述第一设备与所述网络边缘设备之间连接同一网络设备的情况下,所述信息确定单元,包括:第二记录模块,用于获取所述网络设备的第二设备标识,并将所述第二设备标识记录到所述网络信息中;其中,所述网络边缘设备向所述第一设备发送的所述网络异常报文中包括所述网络边缘设备自身的第一设备标识以及所述网络设备的第二设备标识,所述响应于接收到所述网络异常报文,暂停使用所述第一网络进行业务报文的传输,包括:根据所述网络异常报文中的第一设备标识与第二设备标识,以及所述网络信息,确定经过所述网络边缘设备和所述网络设备的第一网络,并从所述第一设备的路由表中删除经过所述网络边缘设备和所述网络设备的第一网络所对应的路由信息。
在一种可能的实现方式中,所述第一设备还包括:报文传输单元,用于在所述第一网络的网络状态正常的情况下,获取待传输的业务报文,对所述待传输的业务报文进行备份,得到备份报文并进行存储,以及,对所述待传输的业务报文添加序列号,并使用所述第一网络将已添加序列号的业务报文发送至所述第二设备。
在一种可能的实现方式中,所述网络边缘设备还用于:基于所述第一网络的历史网络异常报文,预测所述第一网络未来发生网络异常的时间点,所述历史网络异常报文包括所述第一网络在历史时间内发生网络异常时所统计的网络异常报文;基于所述时间点,向所述第一设备发送网络异常报文。
在一种可能的实现方式中,所述第一设备与所述网络边缘设备之间通过内部网络进行报文传输,所述内部网络中设有网络状态检测模块,所述网络状态检测模块用于检测所述内部网络的网络状态,并响应于检测到所述内部网络的网络状态发生异常,向所述第一设备发送内部网络异常报文;所述第一设备还包括:内部网络切换单元,用于响应于所述内部网络异常报文,切换使用备份内部网络与所述网络边缘设备进行报文传输。
本申请实施例还提供了一种设备,该设备可以作为第一设备,该第一设备与第二设备之间通过第一网络进行业务报文的传输,所述第一网络中包括与所述第一设备连接的网络边缘设备;所述第一设备包括接口和处理器;所述接口与所述处理器通信;其中,所述处理器用于:接收来自所述网络边缘设备的网络异常报文,所述网络异常报文表示所述第一网络的网络状态发生异常;响应于接收到所述网络异常报文,切换为使用第二网络向所述第二设备传输备份报文;其中,所述第二网络为所述第一设备与所述第二设备之间网络状态正常的传输网络,所述备份报文包括所述第一网络发生异常时所述第一网络上传输的业务报文所对应的 备份报文。
可选地,所述第一设备和所述第二设备为存储设备,所述第一网络为光传输网络。
在一种可能的实现方式中,所述网络边缘设备用于响应于检测到所述第一网络的网络状态发生异常,向所述第一设备发送网络异常报文。
可选地,所述响应于检测到所述第一网络的网络状态发生异常,向所述第一设备发送网络异常报文,包括:对所述第一网络进行预设指标的定时检测,得到所述预设指标对应的检测结果,所述预设指标包括以下至少一种:端口功率、信号强度、误码率、时延、丢包率;响应于检测到所述预设指标对应的检测结果达到预设阈值,确定所述第一网络的网络状态发生异常,并向所述第一设备发送网络异常报文。
在一种可能的实现方式中,各报文上添加有唯一序列号,所述第二设备用于根据序列号校验报文是否有效并存储有效的报文,所述响应于接收到所述网络异常报文,切换为使用第二网络向所述第二设备传输备份报文,包括:响应于接收到所述网络异常报文,暂停使用所述第一网络进行业务报文的传输,并通过所述第二网络向所述第二设备通知所述第一网络上传输的业务报文的序列号已失效;为所述备份报文添加有效的序列号,并使用所述第二网络向所述第二设备发送已添加有效的序列号的备份报文。
在一种可能的实现方式中,所述处理器还用于在通过所述第一网络进行业务报文的传输之前,确定所述第一网络对应的网络信息,所述网络信息包括所述第一网络中所包含设备的设备标识;其中,所述网络边缘设备向所述第一设备发送的所述网络异常报文中包括所述网络边缘设备自身的第一设备标识,所述响应于接收到所述网络异常报文,暂停使用所述第一网络进行业务报文的传输,包括:根据所述网络异常报文中的第一设备标识以及所述网络信息,确定出包含所述网络边缘设备的第一网络,并从所述第一设备的路由表中删除经过所述网络边缘设备的第一网络所对应的路由信息。
在一种可能的实现方式中,所述第二设备还用于通过所述第一网络向所述网络边缘设备发送探测报文,所述网络边缘设备还用于响应于接收到所述探测报文,将所述网络边缘设备自身的第一设备标识填入所述探测报文中,并将已填入第一设备标识的探测报文发送至所述第一设备;其中,所述确定所述第一网络对应的网络信息,包括:响应于接收到所述已填入第一设备标识的探测报文,从所述已填入第一设备标识的探测报文中提取出第一设备标识,并将提取出的第一设备标识记录到所述网络信息中。
在一种可能的实现方式中,在所述第一设备与所述网络边缘设备之间连接同一网络设备的情况下,所述确定所述第一网络对应的网络信息,包括:获取所述网络设备的第二设备标识,并将所述第二设备标识记录到所述网络信息中;其中,所述网络边缘设备向所述第一设备发送的所述网络异常报文中包括所述网络边缘设备自身的第一设备标识以及所述网络设备的第二设备标识,所述响应于接收到所述网络异常报文,暂停使用所述第一网络进行业务报文的传输,包括:根据所述网络异常报文中的第一设备标识与第二设备标识,以及所述网络信息,确定经过所述网络边缘设备和所述网络设备的第一网络,并从所述第一设备的路由表中删除经过所述网络边缘设备和所述网络设备的第一网络所对应的路由信息。
在一种可能的实现方式中,所述处理器还用于在所述第一网络的网络状态正常的情况下,获取待传输的业务报文,对所述待传输的业务报文进行备份,得到备份报文并进行存储,以及,对所述待传输的业务报文添加序列号,并使用所述第一网络将已添加序列号的业务报文 发送至所述第二设备。
在一种可能的实现方式中,所述网络边缘设备还用于:基于所述第一网络的历史网络异常报文,预测所述第一网络未来发生网络异常的时间点,所述历史网络异常报文包括所述第一网络在历史时间内发生网络异常时所统计的网络异常报文;基于所述时间点,向所述第一设备发送网络异常报文。
在一种可能的实现方式中,所述第一设备与所述网络边缘设备之间通过内部网络进行报文传输,所述内部网络中设有网络状态检测模块,所述网络状态检测模块用于检测所述内部网络的网络状态,并响应于检测到所述内部网络的网络状态发生异常,向所述第一设备发送内部网络异常报文;所述处理器还用于响应于所述内部网络异常报文,切换使用备份内部网络与所述网络边缘设备进行报文传输。
本申请的实施例提供了一种设备管理***,包括第一设备以及网络边缘设备,第一设备用于执行上述设备管理方法中由第一设备执行的操作,网络边缘设备用于执行上述设备管理方法中由网络边缘设备执行的操作。应理解,该设备管理***可以应用于任一数据中心(如上述图2中的第一数据中心),上述包含两个数据中心的容灾***可以包括该设备管理***。
本申请的实施例提供了一种非易失性计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述方法。
本申请的实施例提供了一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备的处理器中运行时,所述电子设备中的处理器执行上述方法。
图10示出根据本申请一实施例的电子设备1300的结构图。如图10所示,该电子设备1300可以是第一设备或网络边缘设备,执行上述设备管理方法中各自所需执行的功能。该电子设备1300包括至少一个处理器1801,至少一个存储器1802、至少一个通信接口1803。此外,该电子设备还可以包括天线等通用部件,在此不再详述。
下面结合图10对电子设备1300的各个构成部件进行具体的介绍。
处理器1801可以是通用中央处理器(CPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制以上方案程序执行的集成电路。处理器1801可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
通信接口1803,用于与其他电子设备或通信网络通信,如以太网,无线接入网(RAN),核心网,无线局域网(Wireless Local Area Networks,WLAN)等。
存储器1802可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(ElectricallyErasable Programmable Read-Only Memory,EEPROM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光 碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过总线与处理器相连接。存储器也可以和处理器集成在一起。
其中,所述存储器1802用于存储执行以上方案的应用程序代码,并由处理器1801来控制执行。所述处理器1801用于执行所述存储器1802中存储的应用程序代码。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(Random Access Memory,RAM)、只读存储器(Read Only Memory,ROM)、可擦式可编程只读存储器(Electrically Programmable Read-Only-Memory,EPROM或闪存)、静态随机存取存储器(Static Random-Access Memory,SRAM)、便携式压缩盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、数字多功能盘(Digital Video Disc,DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。
这里所描述的计算机可读程序指令或代码可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。
用于执行本申请操作的计算机程序指令可以是汇编指令、指令集架构(Instruction Set Architecture,ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(Local Area Network,LAN)或广域网(Wide Area Network,WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或可编程逻辑阵列(Programmable Logic Array,PLA),该电子电路可以执行计算机可读程序指令,从而实现本申请的各个方面。
这里参照根据本申请实施例的方法、装置(***)和计算机程序产品的流程图和/或框图描述了本申请的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装 置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。
附图中的流程图和框图显示了根据本申请的多个实施例的装置、***、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。
也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行相应的功能或动作的硬件(例如电路或ASIC(Application Specific Integrated Circuit,专用集成电路))来实现,或者可以用硬件和软件的组合,如固件等来实现。
尽管在此结合各实施例对本发明进行了描述,然而,在实施所要求保护的本发明过程中,本领域技术人员通过查看所述附图、公开内容、以及所附权利要求书,可理解并实现所述公开实施例的其它变化。在权利要求中,“包括”(comprising)一词不排除其他组成部分或步骤,“一”或“一个”不排除多个的情况。单个处理器或其它单元可以实现权利要求中列举的若干项功能。相互不同的从属权利要求中记载了某些措施,但这并不表示这些措施不能组合起来产生良好的效果。
以上已经描述了本申请的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。

Claims (39)

  1. 一种设备管理方法,其特征在于,第一设备与第二设备之间通过第一网络进行业务报文的传输,所述第一网络中包括与所述第一设备连接的网络边缘设备,所述方法包括:
    所述第一设备接收来自所述网络边缘设备的网络异常报文,所述网络异常报文表示所述第一网络的网络状态发生异常;
    所述第一设备响应于接收到所述网络异常报文,切换为使用第二网络向所述第二设备传输备份报文;其中,所述第二网络为所述第一设备与所述第二设备之间网络状态正常的传输网络,所述备份报文包括所述第一网络发生异常时所述第一网络上传输的业务报文所对应的备份报文。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    所述网络边缘设备响应于检测到所述第一网络的网络状态发生异常,向所述第一设备发送网络异常报文。
  3. 根据权利要求2所述的方法,其特征在于,所述响应于检测到所述第一网络的网络状态发生异常,向所述第一设备发送网络异常报文,包括:
    所述网络边缘设备对所述第一网络进行预设指标的定时检测,得到所述预设指标对应的检测结果,所述预设指标包括以下至少一种:端口功率、信号强度、误码率、时延、丢包率;
    所述网络边缘设备响应于检测到所述预设指标对应的检测结果达到预设阈值,确定所述第一网络的网络状态发生异常,并向所述第一设备发送网络异常报文。
  4. 根据权利要求1所述的方法,其特征在于,各报文上添加有唯一序列号,所述第二设备用于根据序列号校验报文是否有效并存储有效的报文,所述响应于接收到所述网络异常报文,切换为使用第二网络向所述第二设备传输备份报文,包括:
    所述第一设备响应于接收到所述网络异常报文,暂停使用所述第一网络进行业务报文的传输,并通过所述第二网络向所述第二设备通知所述第一网络上传输的业务报文的序列号已失效;
    所述第一设备为所述备份报文添加有效的序列号,并使用所述第二网络向所述第二设备发送已添加有效的序列号的备份报文。
  5. 根据权利要求4所述的方法,其特征在于,在通过所述第一网络进行业务报文的传输之前,所述方法还包括:
    所述第一设备确定所述第一网络对应的网络信息,所述网络信息包括所述第一网络中所包含设备的设备标识;
    其中,所述网络边缘设备向所述第一设备发送的所述网络异常报文中包括所述网络边缘设备自身的第一设备标识,所述响应于接收到所述网络异常报文,暂停使用所述第一网络进行业务报文的传输,包括:
    所述第一设备根据所述网络异常报文中的第一设备标识以及所述网络信息,确定出包含所述网络边缘设备的第一网络,并从所述第一设备的路由表中删除经过所述网络边缘设备的第一网络所对应的路由信息。
  6. 根据权利要求5所述的方法,其特征在于,所述第二设备还用于通过所述第一网络向所述网络边缘设备发送探测报文,所述方法还包括:
    所述网络边缘设备响应于接收到所述探测报文,将所述网络边缘设备自身的第一设备标识填入所述探测报文中,并将已填入第一设备标识的探测报文发送至所述第一设备;
    其中,所述确定所述第一网络对应的网络信息,包括:
    所述第一设备响应于接收到所述已填入第一设备标识的探测报文,从所述已填入第一设备标识的探测报文中提取出第一设备标识,并将提取出的第一设备标识记录到所述网络信息中。
  7. 根据权利要求5或6所述的方法,其特征在于,在所述第一设备与所述网络边缘设备之间连接同一网络设备的情况下,所述确定所述第一网络对应的网络信息,包括:
    所述第一设备获取所述网络设备的第二设备标识,并将所述第二设备标识记录到所述网络信息中;
    其中,所述网络边缘设备向所述第一设备发送的所述网络异常报文中包括所述网络边缘设备自身的第一设备标识以及所述网络设备的第二设备标识,所述响应于接收到所述网络异常报文,暂停使用所述第一网络进行业务报文的传输,包括:
    所述第一设备根据所述网络异常报文中的第一设备标识与第二设备标识,以及所述网络信息,确定经过所述网络边缘设备和所述网络设备的第一网络,并从所述第一设备的路由表中删除经过所述网络边缘设备和所述网络设备的第一网络所对应的路由信息。
  8. 根据权利要求1至7任一项所述的方法,其特征在于,在所述第一网络的网络状态正常的情况下,所述第一设备与所述第二设备之间业务报文的传输过程,包括:
    所述第一设备获取待传输的业务报文,对所述待传输的业务报文进行备份,得到备份报文并进行存储,以及,对所述待传输的业务报文添加序列号,并使用所述第一网络将已添加序列号的业务报文发送至所述第二设备。
  9. 根据权利要求1至7任一项所述的方法,其特征在于,所述方法还包括:
    所述网络边缘设备基于所述第一网络的历史网络异常报文,预测所述第一网络未来发生 网络异常的时间点,所述历史网络异常报文包括所述第一网络在历史时间内发生网络异常时所统计的网络异常报文;
    所述网络边缘设备基于所述时间点,向所述第一设备发送网络异常报文。
  10. 根据权利要求1至7任一项所述的方法,其特征在于,所述第一设备与所述网络边缘设备之间通过内部网络进行报文传输,所述内部网络中设有网络状态检测模块,所述网络状态检测模块用于检测所述内部网络的网络状态,所述方法还包括:
    所述网络状态检测模块响应于检测到所述内部网络的网络状态发生异常,向所述第一设备发送内部网络异常报文;
    所述第一设备响应于所述内部网络异常报文,切换使用备份内部网络与所述网络边缘设备进行报文传输。
  11. 根据权利要求1所述的方法,其特征在于,所述第一设备和所述第二设备为存储设备。
  12. 根据权利要求1所述的方法,其特征在于,所述第一网络为光传输网络。
  13. 一种设备,其特征在于,所述设备作为第一设备,所述第一设备与第二设备之间通过第一网络进行业务报文的传输,所述第一网络中包括与所述第一设备连接的网络边缘设备;所述第一设备包括接收单元和切换单元;其中,
    所述接收单元,用于接收来自所述网络边缘设备的网络异常报文,所述网络异常报文表示所述第一网络的网络状态发生异常;
    所述切换单元,用于响应于接收到所述网络异常报文,切换为使用第二网络向所述第二设备传输备份报文;其中,所述第二网络为所述第一设备与所述第二设备之间网络状态正常的传输网络,所述备份报文包括所述第一网络发生异常时所述第一网络上传输的业务报文所对应的备份报文。
  14. 根据权利要求13所述的设备,其特征在于,所述网络边缘设备用于响应于检测到所述第一网络的网络状态发生异常,向所述第一设备发送网络异常报文。
  15. 根据权利要求14所述的设备,其特征在于,所述响应于检测到所述第一网络的网络状态发生异常,向所述第一设备发送网络异常报文,包括:
    对所述第一网络进行预设指标的定时检测,得到所述预设指标对应的检测结果,所述预设指标包括以下至少一种:端口功率、信号强度、误码率、时延、丢包率;
    响应于检测到所述预设指标对应的检测结果达到预设阈值,确定所述第一网络的网络状态发生异常,并向所述第一设备发送网络异常报文。
  16. 根据权利要求13所述的设备,其特征在于,各报文上添加有唯一序列号,所述第二 设备用于根据序列号校验报文是否有效并存储有效的报文,所述切换单元,包括:
    异常响应模块,用于响应于接收到所述网络异常报文,暂停使用所述第一网络进行业务报文的传输,并通过所述第二网络向所述第二设备通知所述第一网络上传输的业务报文的序列号已失效;
    报文发送模块,用于为所述备份报文添加有效的序列号,并使用所述第二网络向所述第二设备发送已添加有效的序列号的备份报文。
  17. 根据权利要求16所述的设备,其特征在于,所述第一设备还包括:
    信息确定单元,用于在通过所述第一网络进行业务报文的传输之前,确定所述第一网络对应的网络信息,所述网络信息包括所述第一网络中所包含设备的设备标识;
    其中,所述网络边缘设备向所述第一设备发送的所述网络异常报文中包括所述网络边缘设备自身的第一设备标识,所述响应于接收到所述网络异常报文,暂停使用所述第一网络进行业务报文的传输,包括:
    所述第一设备根据所述网络异常报文中的第一设备标识以及所述网络信息,确定出包含所述网络边缘设备的第一网络,并从所述第一设备的路由表中删除经过所述网络边缘设备的第一网络所对应的路由信息。
  18. 根据权利要求17所述的设备,其特征在于,所述第二设备还用于通过所述第一网络向所述网络边缘设备发送探测报文,所述网络边缘设备还用于响应于接收到所述探测报文,将所述网络边缘设备自身的第一设备标识填入所述探测报文中,并将已填入第一设备标识的探测报文发送至所述第一设备;
    其中,所述信息确定单元,包括:第一记录模块,用于响应于接收到所述已填入第一设备标识的探测报文,从所述已填入第一设备标识的探测报文中提取出第一设备标识,并将提取出的第一设备标识记录到所述网络信息中。
  19. 根据权利要求17或18所述的设备,其特征在于,在所述第一设备与所述网络边缘设备之间连接同一网络设备的情况下,所述信息确定单元,包括:第二记录模块,用于获取所述网络设备的第二设备标识,并将所述第二设备标识记录到所述网络信息中;
    其中,所述网络边缘设备向所述第一设备发送的所述网络异常报文中包括所述网络边缘设备自身的第一设备标识以及所述网络设备的第二设备标识,所述响应于接收到所述网络异常报文,暂停使用所述第一网络进行业务报文的传输,包括:
    所述第一设备根据所述网络异常报文中的第一设备标识与第二设备标识,以及所述网络信息,确定经过所述网络边缘设备和所述网络设备的第一网络,并从所述第一设备的路由表中删除经过所述网络边缘设备和所述网络设备的第一网络所对应的路由信息。
  20. 根据权利要求13至19任一项所述的设备,其特征在于,所述第一设备还包括:
    报文传输单元,用于在所述第一网络的网络状态正常的情况下,获取待传输的业务报文,对所述待传输的业务报文进行备份,得到备份报文并进行存储,以及,对所述待传输的业务报文添加序列号,并使用所述第一网络将已添加序列号的业务报文发送至所述第二设备。
  21. 根据权利要求13至19任一项所述的设备,其特征在于,所述网络边缘设备还用于:
    基于所述第一网络的历史网络异常报文,预测所述第一网络未来发生网络异常的时间点,所述历史网络异常报文包括所述第一网络在历史时间内发生网络异常时所统计的网络异常报文;基于所述时间点,向所述第一设备发送网络异常报文。
  22. 根据权利要求13至19任一项所述的设备,其特征在于,所述第一设备与所述网络边缘设备之间通过内部网络进行报文传输,所述内部网络中设有网络状态检测模块,所述网络状态检测模块用于检测所述内部网络的网络状态,并响应于检测到所述内部网络的网络状态发生异常,向所述第一设备发送内部网络异常报文;
    所述第一设备还包括:内部网络切换单元,用于响应于所述内部网络异常报文,切换使用备份内部网络与所述网络边缘设备进行报文传输。
  23. 根据权利要求13所述的设备,其特征在于,所述第一设备和所述第二设备为存储设备。
  24. 根据权利要求13所述的设备,其特征在于,所述第一网络为光传输网络。
  25. 一种设备,其特征在于,所述设备作为第一设备,所述第一设备与第二设备之间通过第一网络进行业务报文的传输,所述第一网络中包括与所述第一设备连接的网络边缘设备;所述第一设备包括接口和处理器;所述接口与所述处理器通信;其中,
    所述处理器用于:
    接收来自所述网络边缘设备的网络异常报文,所述网络异常报文表示所述第一网络的网络状态发生异常;
    响应于接收到所述网络异常报文,切换为使用第二网络向所述第二设备传输备份报文;其中,所述第二网络为所述第一设备与所述第二设备之间网络状态正常的传输网络,所述备份报文包括所述第一网络发生异常时所述第一网络上传输的业务报文所对应的备份报文。
  26. 根据权利要求25所述的设备,其特征在于,所述网络边缘设备用于响应于检测到所述第一网络的网络状态发生异常,向所述第一设备发送网络异常报文。
  27. 根据权利要求26所述的设备,其特征在于,所述响应于检测到所述第一网络的网络状态发生异常,向所述第一设备发送网络异常报文,包括:
    对所述第一网络进行预设指标的定时检测,得到所述预设指标对应的检测结果,所述预设指标包括以下至少一种:端口功率、信号强度、误码率、时延、丢包率;
    响应于检测到所述预设指标对应的检测结果达到预设阈值,确定所述第一网络的网络状态发生异常,并向所述第一设备发送网络异常报文。
  28. 根据权利要求25所述的设备,其特征在于,各报文上添加有唯一序列号,所述第二设备用于根据序列号校验报文是否有效并存储有效的报文,所述响应于接收到所述网络异常报文,切换为使用第二网络向所述第二设备传输备份报文,包括:
    响应于接收到所述网络异常报文,暂停使用所述第一网络进行业务报文的传输,并通过所述第二网络向所述第二设备通知所述第一网络上传输的业务报文的序列号已失效;
    为所述备份报文添加有效的序列号,并使用所述第二网络向所述第二设备发送已添加有效的序列号的备份报文。
  29. 根据权利要求28所述的设备,其特征在于,所述处理器还用于:
    在通过所述第一网络进行业务报文的传输之前,确定所述第一网络对应的网络信息,所述网络信息包括所述第一网络中所包含设备的设备标识;
    其中,所述网络边缘设备向所述第一设备发送的所述网络异常报文中包括所述网络边缘设备自身的第一设备标识,所述响应于接收到所述网络异常报文,暂停使用所述第一网络进行业务报文的传输,包括:
    根据所述网络异常报文中的第一设备标识以及所述网络信息,确定出包含所述网络边缘设备的第一网络,并从所述第一设备的路由表中删除经过所述网络边缘设备的第一网络所对应的路由信息。
  30. 根据权利要求29所述的设备,其特征在于,所述第二设备还用于通过所述第一网络向所述网络边缘设备发送探测报文,所述网络边缘设备还用于响应于接收到所述探测报文,将所述网络边缘设备自身的第一设备标识填入所述探测报文中,并将已填入第一设备标识的探测报文发送至所述第一设备;
    其中,所述确定所述第一网络对应的网络信息,包括:
    响应于接收到所述已填入第一设备标识的探测报文,从所述已填入第一设备标识的探测报文中提取出第一设备标识,并将提取出的第一设备标识记录到所述网络信息中。
  31. 根据权利要求29或30所述的设备,其特征在于,在所述第一设备与所述网络边缘设备之间连接同一网络设备的情况下,所述确定所述第一网络对应的网络信息,包括:
    获取所述网络设备的第二设备标识,并将所述第二设备标识记录到所述网络信息中;
    其中,所述网络边缘设备向所述第一设备发送的所述网络异常报文中包括所述网络边缘设备自身的第一设备标识以及所述网络设备的第二设备标识,所述响应于接收到所述网络异常报文,暂停使用所述第一网络进行业务报文的传输,包括:
    根据所述网络异常报文中的第一设备标识与第二设备标识,以及所述网络信息,确定经过所述网络边缘设备和所述网络设备的第一网络,并从所述第一设备的路由表中删除经过所述网络边缘设备和所述网络设备的第一网络所对应的路由信息。
  32. 根据权利要求25至31任一项所述的设备,其特征在于,所述处理器还用于:
    在所述第一网络的网络状态正常的情况下,获取待传输的业务报文,对所述待传输的业务报文进行备份,得到备份报文并进行存储,以及,对所述待传输的业务报文添加序列号,并使用所述第一网络将已添加序列号的业务报文发送至所述第二设备。
  33. 根据权利要求25至31任一项所述的设备,其特征在于,所述网络边缘设备还用于:
    基于所述第一网络的历史网络异常报文,预测所述第一网络未来发生网络异常的时间点,所述历史网络异常报文包括所述第一网络在历史时间内发生网络异常时所统计的网络异常报文;基于所述时间点,向所述第一设备发送网络异常报文。
  34. 根据权利要求25至31任一项所述的设备,其特征在于,所述第一设备与所述网络边缘设备之间通过内部网络进行报文传输,所述内部网络中设有网络状态检测模块,所述网络状态检测模块用于检测所述内部网络的网络状态,并响应于检测到所述内部网络的网络状态发生异常,向所述第一设备发送内部网络异常报文;
    所述处理器还用于:响应于所述内部网络异常报文,切换使用备份内部网络与所述网络边缘设备进行报文传输。
  35. 根据权利要求25所述的设备,其特征在于,所述第一设备和所述第二设备为存储设备。
  36. 根据权利要求25所述的设备,其特征在于,所述第一网络为光传输网络。
  37. 一种设备管理***,其特征在于,包括第一设备以及网络边缘设备,所述第一设备用于执行权利要求1-10任一项所述的方法中由所述第一设备执行的操作,所述网络边缘设备用于执行权利要求1-10任一项所述的方法中由所述网络边缘设备执行的操作。
  38. 一种非易失性计算机可读存储介质,其上存储有计算机程序指令,其特征在于,所述计算机程序指令被处理器执行时实现权利要求1-10任一项所述的方法。
  39. 一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行权利要求1-10任一项所述的方法。
PCT/CN2023/103133 2022-10-27 2023-06-28 设备管理方法、设备、***和存储介质 WO2024087692A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211328073.4A CN117955903A (zh) 2022-10-27 2022-10-27 设备管理方法、设备、***和存储介质
CN202211328073.4 2022-10-27

Publications (1)

Publication Number Publication Date
WO2024087692A1 true WO2024087692A1 (zh) 2024-05-02

Family

ID=90800559

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/103133 WO2024087692A1 (zh) 2022-10-27 2023-06-28 设备管理方法、设备、***和存储介质

Country Status (2)

Country Link
CN (1) CN117955903A (zh)
WO (1) WO2024087692A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105791072A (zh) * 2014-12-22 2016-07-20 华为数字技术(苏州)有限公司 以太虚拟网络的接入方法及装置
CN108600165A (zh) * 2018-03-15 2018-09-28 北京大米科技有限公司 基于tcp的通信方法、客户端、中心节点和通信***
CN112205036A (zh) * 2018-05-31 2021-01-08 摩博菲乐有限公司Dba摩博莱 动态信道绑定的***和方法
CN114339869A (zh) * 2022-02-25 2022-04-12 京东科技信息技术有限公司 网络管理方法、装置、电子设备和存储介质
CN114389726A (zh) * 2022-01-18 2022-04-22 即刻雾联科技(北京)有限公司 一种基于边缘设备的智能组网方法、***及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105791072A (zh) * 2014-12-22 2016-07-20 华为数字技术(苏州)有限公司 以太虚拟网络的接入方法及装置
CN108600165A (zh) * 2018-03-15 2018-09-28 北京大米科技有限公司 基于tcp的通信方法、客户端、中心节点和通信***
CN112205036A (zh) * 2018-05-31 2021-01-08 摩博菲乐有限公司Dba摩博莱 动态信道绑定的***和方法
CN114389726A (zh) * 2022-01-18 2022-04-22 即刻雾联科技(北京)有限公司 一种基于边缘设备的智能组网方法、***及存储介质
CN114339869A (zh) * 2022-02-25 2022-04-12 京东科技信息技术有限公司 网络管理方法、装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN117955903A (zh) 2024-04-30

Similar Documents

Publication Publication Date Title
US10917322B2 (en) Network traffic tracking using encapsulation protocol
CN112866004B (zh) 控制面设备的切换方法、装置及转控分离***
US20090024722A1 (en) Proxying availability indications in a failover configuration
US11395189B2 (en) State machine handling at a proxy node in an Ethernet-based fronthaul network
WO2015109443A1 (zh) 网络服务故障处理方法,服务管理***和***管理模块
US11307945B2 (en) Methods and apparatus for detecting, eliminating and/or mitigating split brain occurrences in high availability systems
KR102167613B1 (ko) 메시지 푸시 방법 및 장치
US10033602B1 (en) Network health management using metrics from encapsulation protocol endpoints
US20090199040A1 (en) Method and device for implementing link pass through in point-to-multipoint network
CN112506702B (zh) 数据中心容灾方法、装置、设备及存储介质
WO2021184587A1 (zh) 基于Prometheus的私有云监控方法、装置、计算机设备及存储介质
KR20150007623A (ko) 패킷 전달 시스템에서의 보호 절체 방법 및 장치
WO2015143810A1 (zh) 节点故障检测方法及装置
US10944636B2 (en) Dynamically identifying criticality of services and data sources
US20160057043A1 (en) Diagnostic routing system and method for a link access group
US7831686B1 (en) System and method for rapidly ending communication protocol connections in response to node failure
CN113300917A (zh) Open Stack租户网络的流量监控方法、装置
WO2024087692A1 (zh) 设备管理方法、设备、***和存储介质
US20180048689A1 (en) Network stream processing to ensuring a guarantee that each record is accounted for exactly once
US20200076927A1 (en) Virtualization with distributed adaptive message brokering
WO2015158058A1 (zh) 一种实现呼叫保存和恢复的方法及***
CN115037537A (zh) 异常流量拦截、异常域名识别方法、装置、设备及介质
WO2016184025A1 (zh) 一种设备管理方法和装置
CN109039680B (zh) 一种切换主宽带网络网关bng和备bng的方法、***和bng
CN105025028A (zh) 基于流量分析的ip黑洞发现方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23881281

Country of ref document: EP

Kind code of ref document: A1