CN115396308A - System, method and device for maintaining network stability of data center - Google Patents

System, method and device for maintaining network stability of data center Download PDF

Info

Publication number
CN115396308A
CN115396308A CN202210892208.3A CN202210892208A CN115396308A CN 115396308 A CN115396308 A CN 115396308A CN 202210892208 A CN202210892208 A CN 202210892208A CN 115396308 A CN115396308 A CN 115396308A
Authority
CN
China
Prior art keywords
fault
transmission line
delay
data packet
recovery
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210892208.3A
Other languages
Chinese (zh)
Inventor
张欢
谢崇进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202210892208.3A priority Critical patent/CN115396308A/en
Publication of CN115396308A publication Critical patent/CN115396308A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0813Configuration setting characterised by the conditions triggering a change of settings
    • H04L41/0816Configuration setting characterised by the conditions triggering a change of settings the condition being an adaptation, e.g. in response to network events
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0852Delays

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the specification provides a system, a method and a device for maintaining network stability by a data center, wherein the method for maintaining network stability by the data center comprises the following steps: inserting a fault signal into a data packet transmitted to the data communication device in case of a fault in the transmission line; inserting a failure signal into a data packet transmitted to the data communication device in a case where the transmission line is restored from a failure to normal; timing the duration time of the transmission line for recovering to normal to obtain the recovery delay time; and when the recovery delay time reaches a preset recovery threshold range, recovering transparent transmission of the data packet of the transmission line, wherein the preset recovery threshold range is determined according to the estimated time required by the transmission line to complete fault recovery.

Description

System, method and device for maintaining network stability of data center
Technical Field
The embodiment of the specification relates to the technical field of communication, in particular to a method for maintaining network stability of a data center.
Background
With the development of cloud computing, a large number of data centers are developed, and data communication devices between the data centers are interconnected through optical transmission devices to form a network of the data centers. Data communications equipment interconnected by data centers typically includes routers and switches. The connections between these data communication device ports carry a large amount of data. When faults such as interruption and deterioration occur on a transmission line, the connectivity between data communication devices is affected, and a flow black hole is seriously caused, so that data loss is caused. Therefore, the data communication device needs to continuously poll the port to sense the failure or the failure recovery, and when sensing the failure, the upper layer routing protocol closes the port, and when sensing the failure recovery, the upper layer routing protocol opens the port, so as to avoid data loss.
However, the continuous polling of the port by the data communication device results in the waste of CPU (processor) resources of the data communication device, and the timed polling cannot accurately sense the failure recovery, resulting in the loss of a link packet, and once the transmission line is continuously switched on and off for many times in the maintenance process, there will be short-time jitter when the transmission line failure is recovered, which causes the jitter of the upper layer routing protocol of the port and affects the convergence of the upper layer routing protocol.
Disclosure of Invention
In view of this, the embodiments of the present specification provide a method for a data center to maintain network stability. One or more embodiments of the present specification also relate to a system, an apparatus, a computing device, a computer-readable storage medium, and a computer program for maintaining network stability in a data center, so as to solve the technical problems in the prior art.
According to a first aspect of embodiments herein, there is provided a system for maintaining network stability in a data center, including: the data communication equipment is connected with the optical transmission equipment so as to access a network through the optical transmission equipment; the optical transmission device is configured to insert a fault signal into a data packet transmitted to the data communication device when a transmission line fails, insert a fault signal into the data packet transmitted to the data communication device when the transmission line is restored from a fault to a normal state, time the duration of the transmission line restoration to the normal state to obtain a restoration delay time, and restore transparent transmission of the data packet to the transmission line when the restoration delay time reaches a preset restoration threshold range, wherein the preset restoration threshold range is determined according to a time required for predicting the transmission line to complete the fault restoration; the data communication equipment is configured to receive a data packet from the optical transmission equipment, trigger a routing protocol to close a port corresponding to the transmission line if the data packet contains a fault signal, and trigger the routing protocol to open the port corresponding to the transmission line if the data packet not containing the fault signal is received from the optical transmission equipment after the port is closed.
Optionally, the delay duration parameters of the interrupt delay function and the start delay function of the data communication device port are set to zero, or the interrupt delay function and the start delay function are cancelled.
According to a second aspect of the embodiments of the present specification, there is provided a method for maintaining network stability in a data center, which is applied to an optical transmission device, and includes: inserting a fault signal into a data packet transmitted to the data communication device in case of a fault occurring in the transmission line; inserting a failure signal into a data packet transmitted to the data communication device in a case where the transmission line is restored from a failure to normal; timing the duration time of the transmission line for recovering to normal to obtain recovery delay time; and when the recovery delay time reaches a preset recovery threshold range, recovering transparent transmission of the data packet of the transmission line, wherein the preset recovery threshold range is determined according to the estimated time required by the transmission line to complete fault recovery.
Optionally, the method further comprises: responding to the transmission line fault, timing fault duration to obtain fault duration; under the condition that the transmission line is recovered to be normal from the fault, judging whether the fault duration reaches a preset delay effective threshold range, wherein the preset delay effective threshold range is determined according to the duration required by the estimated optical packet switching to finish protection switching; if so, the method proceeds to the step of inserting a failure signal into the data packet transmitted to the data communication device when the transmission line is restored from the failure to normal.
Optionally, the method further comprises: if the fault duration does not reach the preset delay effective threshold range, judging whether a fault exists at the far end for sending the data packet; if the data packet has a fault, inserting a fault signal into the data packet; and if the fault does not exist, restoring the transparent transmission of the data packet of the transmission line.
Optionally, the method further comprises: if the transmission line fails again before the recovery delay time reaches the preset recovery threshold range, stopping timing the duration time of the transmission line recovering to normal; and when the transmission line returns to be normal again after the failure again, timing the duration time of the transmission line returning to be normal again.
Optionally, the inserting a fault signal into the data packet transmitted to the data communication device in case of a fault in the transmission line includes: judging whether a far end for sending a data packet has a fault or not under the condition that a transmission line has a fault; if yes, inserting a fault signal into the data packet transmitted to the data communication equipment; if not, inserting a network idle signal into a data packet transmitted to the data communication equipment, and timing the duration of the transmission line fault to obtain delay interruption duration; and inserting a fault signal into the data packet transmitted to the data communication equipment when the delay interruption time reaches a preset delay interruption threshold range, wherein the preset delay interruption threshold range is determined according to the estimated time required by the completion of protection switching of the optical packet switching.
Optionally, the method further comprises: and on the basis of the preset delay interruption threshold range, adding a preset time increment to obtain a preset delay effective threshold range.
Optionally, the step of timing the duration of the transmission line returning to normal and the step of timing the duration of the transmission line fault are performed by using the same timer.
Optionally, the method further comprises: judging whether the transmission line has a fault according to a fault alarm signal received from the transmission line; wherein the fault alert signal comprises: any one or more of an optical signal loss alarm signal, an optical signal frame loss signal, an error code alarm signal and an out-of-signal lock alarm signal.
Optionally, the determining whether the transmission line has a fault according to the fault warning signal received from the transmission line includes: acquiring a forward error correction code value according to an error code warning signal received from the transmission line; and determining that the transmission line has a fault under the condition that the forward error correction code value reaches a preset error code threshold range, wherein the preset error code threshold range is determined according to the error correction capability limit of the forward error correction code.
According to a third aspect of the embodiments of the present specification, there is provided an apparatus for maintaining network stability in a data center, configured in an optical transmission device, including: and a fault response module configured to insert a fault signal into the data packet transmitted to the data communication device in case of a fault in the transmission line. A restoration response module configured to insert a failure signal into a data packet transmitted to the data communication device in a case where the transmission line is restored from a failure to normal. And the recovery delay timing module is configured to time the duration time of the transmission line recovering to the normal state to obtain the recovery delay time length. And the recovery transparent transmission module is configured to recover transparent transmission of the data packet of the transmission line under the condition that the recovery delay time reaches a preset recovery threshold range, wherein the preset recovery threshold range is determined according to the estimated time required by the transmission line to complete fault recovery.
According to a fourth aspect of embodiments herein, there is provided a computing device comprising: a memory and a processor; the memory is used for storing computer-executable instructions, and the processor is used for executing the computer-executable instructions, and the computer-executable instructions when executed by the processor realize the steps of the method for maintaining the network stability of the data center according to any embodiment of the specification.
According to a fifth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, perform the steps of a method for a data center to maintain network stability as described in any of the embodiments herein.
An embodiment of the present specification provides a method for maintaining network stability in a data center, according to which, when a transmission line fails, an optical transmission device inserts a fault signal into a data packet transmitted to a data communication device, so that the data communication device triggers a routing protocol to close a port corresponding to the transmission line according to the data packet including the fault signal, and when the transmission line recovers from a fault to a normal state, inserts the fault signal into the data packet transmitted to the data communication device, times a duration of the transmission line recovering from a normal state to obtain a recovery delay time, and when the recovery delay time reaches a preset recovery threshold range, resumes transparent transmission of the data packet of the transmission line, wherein the preset recovery threshold range is determined according to a time required to estimate the completion of the fault recovery of the transmission line, so that, after the port is closed, if a data packet not including the fault signal is received from the optical transmission device, the optical transmission device triggers the routing protocol to open the port corresponding to the transmission line. Therefore, the method can avoid the condition that the data communication equipment consumes CPU resource to poll the port state, and when the transmission line is continuously switched on and off and jitters for multiple times in the recovery process, because the optical transmission equipment forcibly inserts fault signals based on the timing of the recovery delay time length, the delay anti-jittering of fault recovery is realized, for the multiple times of switched-off and jitters in the recovery process, the data communication equipment only senses the fault from the optical transmission equipment once and senses the normal condition through the transparent data packet after the link is repaired to be normal for a certain time, so that the data communication equipment can recover the port state after the transmission is completely repaired, the jitter of the routing protocol of the data communication equipment is avoided, the high-efficiency convergence completion of the routing protocol of the data communication equipment is ensured, and in addition, the optical transmission equipment can more accurately sense the fault recovery of the transmission line relative to the data communication equipment, and the packet loss is prevented.
Drawings
Fig. 1 is a schematic application scenario diagram of a system for maintaining network stability in a data center according to an embodiment of the present disclosure;
fig. 2 is a schematic application scenario diagram of a system for maintaining network stability in a data center according to another embodiment of the present disclosure;
FIG. 3 is a flow chart of a method for maintaining network stability in a data center according to an embodiment of the present disclosure;
fig. 4 is a flowchart of a processing procedure of a method for maintaining network stability in a data center according to an embodiment of the present disclosure;
FIG. 5 is a flowchart illustrating a process of a method for maintaining network stability in a data center according to another embodiment of the present disclosure;
FIG. 6 is a flowchart illustrating a process of a method for maintaining network stability in a data center according to another embodiment of the present disclosure;
FIG. 7 is a schematic diagram of a signal interpolation time point sequence provided by an embodiment of the present disclosure;
fig. 8 is a schematic structural diagram of an apparatus for maintaining network stability in a data center according to an embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of a system for maintaining network stability in a data center according to an embodiment of the present disclosure;
fig. 10 is a block diagram of a computing device according to an embodiment of the present disclosure.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be implemented in many ways other than those specifically set forth herein, and those skilled in the art will appreciate that the present description is susceptible to similar generalizations without departing from the scope of the description, and thus is not limited to the specific implementations disclosed below.
The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can be termed a second and, similarly, a second can be termed a first without departing from the scope of one or more embodiments of the present description. The word "if," as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination," depending on the context.
First, the noun terms to which one or more embodiments of the present specification relate are explained.
A data center is a network of devices used to transmit, accelerate, display, compute, store data information over an Internet infrastructure.
An optical transmission device is a device that converts various signals into optical signals and transmits the optical signals on an optical fiber.
Data communication equipment is equipment used for data communication in a data center, such as a router, a switch, and the like.
The transmission line is a line for linking each node in the network, and serves as a bridge for data transmission between the nodes.
Transparent transmission, i.e. transparent transmission, refers to the transmission of transmitted content from a source to a destination without any change to the data content, regardless of the content being transmitted in the communication.
With the development of cloud computing, a large number of data centers have emerged. The data communication equipment between the data center and the data center are interconnected through optical transmission equipment to form a network of the data center. The connections between ports of the data communication device carry a large amount of data. When a physical network is interrupted, deteriorated and other faults occur, the connectivity between data communication devices is affected, and a flow black hole is seriously caused, so that data loss is caused.
In order to improve the network stability, the data communication device may poll the ports periodically, so that the physical layer may sense a failure when the failure occurs and trigger the upper layer routing protocol to close the ports. And triggering an upper layer routing protocol to open a port when the recovery of the physical layer signal is sensed. In addition, the optical transmission network can also switch to another path through the optical switch when one transmission line is interrupted, so as to limit the physical layer link interruption time within a time as short as possible, such as within 50 ms. Therefore, the data communication equipment does not need to trigger the rerouting of the routing protocol in the protection switching process of the link, can close the port when the link is completely interrupted, and routes the message to other ports through the routing protocol. However, since the network model is divided into a 0 optical layer, a 1 OTN layer, a 2 ethernet layer, a 3 layer and above I P layer, when an interrupt occurs, the interrupted fault signal is transmitted from the bottom layer to the upper layer, and the processing time logic is more complicated toward the upper layer, so that the time consumption of sensing the interrupt and sensing the interrupt recovery by the data communication device through the polling port is longer, and the communication efficiency of the data communication device is affected due to the consumption of the CPU resource of the data communication device caused by frequent polling. Because the link cannot be accurately sensed in time, a certain packet loss can be generated in the process from the interruption of the link to the completion of the route switching. Moreover, when the link is continuously switched on and off for many times in the maintenance process, the jitter of the upper layer routing protocol of the port is caused, the link of the routing protocol cannot be converged, and a long-time packet loss exists from the upper layer.
In view of the above, the present specification provides a system, a method and an apparatus for data center maintenance network stability, and the present specification relates to a computing device and a computer readable storage medium, which are described in detail in the following embodiments.
Referring to fig. 1, fig. 1 is a schematic view illustrating an application scenario of a system for maintaining network stability in a data center according to an embodiment of the present disclosure. In a data center network, a data center may include multiple data centers, for example, as shown in fig. 1, data center 2, data center 3 … …. Each data center may include therein: data communication equipment and optical transmission equipment. For example, the data communication device may be a switch, router, or the like. For example, the optical transmission device may be a DCI (Data Center Internet) optical transmission device as shown in fig. 1. According to the system provided by the embodiment of the present specification, taking the optical transmission device and the data communication device of any data center as an example: the optical transmission equipment inserts fault signals into data packets transmitted by local data communication equipment under the condition that a transmission line has a fault, under the condition that the transmission line is recovered to be normal from the fault, inserts fault signals into the data packets transmitted by the data communication equipment, times the duration of the recovery of the transmission line to be normal to obtain the recovery delay time, and when the recovery delay time reaches the preset recovery threshold range, the transparent transmission of the data packets of the transmission line is recovered, wherein the preset recovery threshold range is determined according to the estimated time required by the completion of the fault recovery of the transmission line. The data communication equipment receives a data packet from optical transmission equipment, if the data packet contains a fault signal, a routing protocol is triggered to close a port corresponding to the transmission line, and if the data packet which does not contain the fault signal is received from the optical transmission equipment after the port is closed, the routing protocol is triggered to open the port corresponding to the transmission line. Therefore, the system can prevent the data communication equipment from consuming the state of the CPU resource polling port, and when the transmission line is continuously disconnected and jittered for multiple times in the recovery process, because the optical transmission equipment forcibly inserts the fault signal based on the timing of the recovery delay time length, the delay anti-jitter of the fault recovery is realized, the jitter of the routing protocol of the data communication equipment is avoided, the convergence of the routing protocol of the data communication equipment is not influenced, and in addition, the optical transmission equipment can more accurately sense the fault recovery of the transmission line relative to the data communication equipment, and the packet loss is prevented.
Referring to fig. 2, fig. 2 is a schematic view illustrating an application scenario of a system for maintaining network stability in a data center according to another embodiment of the present disclosure. As shown in the application scenario of fig. 2, the optical transmission device provides a client-side module facing the data communication device as an interface for interacting with data packets of the data communication device, and the optical transmission device provides a line-side module facing the transmission line as an interface for interacting with data packets of the transmission line. Take the line side and the client side in the data center 1 as an example: when the line side receives a packet from the transmission line, the packet is handed to the client side. The client side obtains the data package from the line side, under the condition that the transmission line breaks down, inserts the fault signal to the data package that receives under the fault condition the transmission line recovers from the trouble and is normal under, inserts the fault signal to the data package that receives under the fault recovery condition to timing transmission line resumes normal duration, obtain and resume the delay duration, work as it reaches under the condition of predetermineeing the recovery threshold range to resume the delay duration, resume right the passthrough of the data package of transmission line.
It should be noted that the client-side module and the line-side module shown in fig. 2 are divided according to different interface functions, and in practical applications, the client-side module and the line-side module may also be combined into one functional module to implement, which is not limited in this description embodiment.
Taking the above application scenario based on ethernet implementation as an example, according to the system provided in this specification, the influence of transmission line jitter (for example, jitter that may be caused by transmission link switching or multiple continuous interrupts) on the routing protocol of the data communication device can be reduced by improving the ethernet maintenance signal design of the interface between the optical transmission device and the data communication device.
Referring to fig. 3, fig. 3 is a flowchart illustrating a method for maintaining network stability of a data center according to an embodiment of the present disclosure, where the method is applied to an optical transmission device. For example, in connection with the application scenario shown in fig. 2, the method may be applied to a client side of an optical transmission apparatus. The method specifically comprises the following steps.
Step 302: in the event of a failure of the transmission line, a failure signal is inserted into the data packet transmitted to the data communication device.
Wherein the failure may be an interruption of the transmission line for any reason. The manner in which the optical transmission device senses the failure is not limited. For example, the optical transmission device may determine whether the transmission line has a fault based on a fault alarm signal received from the transmission line. Wherein the fault warning signal comprises any one or more of the following fault warning signals:
a LOSs of optical Signal alarm Signal LOS (LOSs of Signal);
an optical signal Frame Loss signal LOF (Loss of Frame);
error code alarm Signal SD/SF (Signal Degrade, signal Failure, etc.) error code related alarm;
loss of Lock alarm LOL (Loss of Lock).
The basis for the optical transmission device to determine the line interruption includes, but is not limited to, the above-mentioned fault alarm signal.
Taking error code warning signal as an example, the determining whether the transmission line is faulty according to the fault warning signal received from the transmission line includes: acquiring a forward error correction code value according to an error code warning signal received from the transmission line; and determining that the transmission line has a fault under the condition that the forward error correction code value reaches a preset error code threshold range. And determining the preset error code threshold range according to the error correction capability limit of the forward error correction code. For example, the preset error code threshold range may be a range larger than a preset error code threshold, and when the forward error correction code value is larger than the preset error code threshold, it is determined that the transmission line has a fault. In this embodiment, considering that when the line degrades to near the error correction limit of the forward error correction code, errors after correction may occur intermittently, in order to avoid the jitter caused to the link due to the degradation of the transmission link, the embodiment sets the preset error threshold according to the limit of the forward error correction code value, for example, the preset error threshold may be directly set to the limit or set to be closer to a normal value than the limit, so that when the forward error correction code value is greater than the preset error threshold, the line is considered to be faulty, and it is determined that the line is recovered from the fault until the forward error correction code value is not greater than the preset error threshold.
Step 304: and inserting a failure signal into the data packet transmitted to the data communication device when the transmission line is restored from the failure to normal.
The manner in which the optical transmission device senses the restoration of the transmission line from the failure to the normal state is not limited. For example, the optical transmission device may make a determination based on whether or not the optical signal received from the transmission line contains a failure alarm signal, and in the case where the failure alarm signal is not contained, it may be determined that the transmission line has recovered from a failure to normal.
Here, the data packet transmitted to the data communication device may be understood as a data packet received from the transmission line by the optical transmission device and addressed to the data communication device. For example, the data packet may be a data packet sent by other data communication devices of other data centers to the data communication device of the data center.
The fault signal is used to indicate any type of fault, such as a transmission line interruption. The form of expression of the Fault signal is not limited, and for example, an LF (Local Fault) signal may be used to represent the Fault signal in ethernet.
Step 306: and timing the duration time of the transmission line recovering to normal to obtain the recovery delay time.
The specific embodiment of timing the time duration for which the transmission line is restored to normal is not limited. For example, in the case where the initial value of the timing is zero, the timing may be performed in such a manner that the count is incremented. For another example, in the case that the initial value of the timing is the upper threshold that needs to be set according to the actual application scenario, the timing may be performed in a count-down manner. Whether the counting is counted up or down, the recovery delay time length can be obtained.
Step 308: and when the recovery delay time reaches a preset recovery threshold range, recovering transparent transmission of the data packet of the transmission line, wherein the preset recovery threshold range is determined according to the estimated time required by the transmission line to complete fault recovery.
The preset recovery threshold range is not limited in setting mode, and specifically can be set according to the estimated time consumed for completing fault repair of the line in an actual application scene, and the estimated time consumed for completing fault repair of the line can also be understood as the estimated line interruption time required to be shielded.
For example, assuming that a repair of an optical fiber takes 10 minutes after a transmission line is interrupted, and it is expected that there may be repeated on-off within the 10 minutes, it is estimated that 10 minutes is required for the repair of the line interruption. If the transmission line resumes the normal duration timing, the timing is started from zero by counting in an incremental manner, the initial value of the timing may be set to zero, the preset resumption threshold range is set to 10 minutes, and when the resumption delay duration timing is equal to 10 minutes, the resumption delay duration reaches the preset resumption threshold range, and the optical transmission device resumes the transparent transmission of the data packet of the transmission line. Assuming that the transmission line resumes the normal duration timing, the timing is started from 10 minutes and performed in a count-down manner, the initial value of the timing may be set to 10 minutes, the preset resumption threshold range is set to zero, and when the resumption delay time period timing is equal to zero, the resumption delay time period reaches the preset resumption threshold range, and the optical transmission device resumes the transparent transmission of the data packet of the transmission line.
In addition, in order to avoid the jitter of the upper layer routing protocol of the data communication device caused by multiple continuous disconnections to the maximum extent, in another or more embodiments of the present specification, the method further comprises: if the transmission line fails again before the recovery delay time reaches the preset recovery threshold range, stopping timing the duration time of the transmission line recovering to normal; and when the transmission line returns to be normal again after the secondary fault, timing the duration of the transmission line returning to be normal again. For example, assuming that the preset recovery threshold range is 10 minutes, when the duration of the transmission line returning to normal is counted to 9 minutes, the transmission line fails again, at this time, the timing is stopped, when the transmission line returns to normal again, the duration of the transmission line returning to normal is counted again from zero, and only when the recovery delay time reaches 10 minutes, that is, when the transmission line actually returns to normal, the transparent transmission of the data packet of the transmission line is resumed, so that the jitter of the upper layer routing protocol of the data communication device caused by continuous multiple breakdowns can be avoided to the maximum extent.
It can be seen that, according to the method, when a transmission line is in fault, an optical transmission device inserts a fault signal into a data packet transmitted to the data communication device, so that the data communication device triggers a routing protocol to close a port corresponding to the transmission line according to the fact that the data packet includes the fault signal, when the transmission line is recovered from the fault to be normal, the fault signal is inserted into the data packet transmitted to the data communication device, the duration time for the transmission line to recover to be normal is timed, a recovery delay time is obtained, and when the recovery delay time reaches a preset recovery threshold range, transparent transmission of the data packet of the transmission line is recovered, wherein the preset recovery threshold range is determined according to a time required for predicting the completion of fault recovery of the transmission line, so that if the data packet not including the fault signal is received from the optical transmission device after the port is closed, the routing protocol is triggered to open the port corresponding to the transmission line, thus, the method can avoid the data communication device consuming a CPU resource polling port state, and when the optical transmission device is continuously disconnected for many times in the recovery process, the optical transmission device inserts the fault signal based on the recovery delay, thereby realizing the timing, avoiding the influence of the jitter of the data packet loss caused by the optical transmission device, and preventing the jitter of the data communication device from being more accurately perceived by the jitter.
In another or more embodiments of the present disclosure, an embodiment that improves the availability of the link by determining whether the delayed recovery is effective when the transmission line is recovered to be normal is described in detail below with reference to fig. 4. In this embodiment, the method further comprises: responding to the transmission line fault, timing the fault duration time to obtain the fault duration time; under the condition that the transmission line is recovered to be normal from the fault, judging whether the fault duration reaches a preset delay effective threshold range, wherein the preset delay effective threshold range is determined according to the duration required by the estimated optical packet switching to finish protection switching; if so, the method proceeds to the step of inserting a failure signal into the data packet transmitted to the data communication device when the transmission line is restored from the failure to normal.
Specifically, fig. 4 shows a processing flow chart of a method for maintaining network stability in a data center according to an embodiment of the present specification, which specifically includes the following steps.
Step 402: and responding to the transmission line fault, and timing the fault duration to obtain the fault duration.
Step 404: and under the condition that the transmission line is recovered to be normal from the fault, judging whether the fault duration reaches a preset delay effective threshold range.
Wherein, the preset delay effective threshold range is determined according to the time length required by the estimated optical packet switching to complete the protection switching. Specifically, for example, the lower limit of the preset delay effective threshold range is greater than the time required by the optical packet switch to complete the protection switching. For example, a small margin may be added to the time required for completing the protection switching in the optical packet switching to obtain the lower limit of the preset delay effective threshold range. And when the fault duration is greater than the lower limit, reaching a preset delay effective threshold range. For example, it is assumed that the time required for completing the protection switching by the optical packet switching is 50ms, and a 10ms margin is added, so that the lower limit of the range of the preset delay effective threshold is 60ms.
If the fault duration does not reach the preset delay effectiveness threshold range, step 406 is entered. If the fault duration reaches the preset delay effective threshold range, go to step 412.
Step 406: and judging whether the far end for sending the data packet has a fault or not.
Step 408: and if the fault exists, inserting a fault signal into the data packet transmitted to the data communication equipment.
Step 410: and if the fault does not exist, restoring the transparent transmission of the data packet of the transmission line.
Step 412: inserting a fault signal into a data packet transmitted to the data communication device.
Step 414: and timing the duration time of the transmission line recovering to normal to obtain the recovery delay time.
Step 416: when the recovery delay time reaches the preset recovery threshold range, step 406 is performed to recover transparent transmission of the data packet of the transmission line.
In the above embodiment, when the transmission line is recovered, it is first determined whether the fault duration (or referred to as interruption time) reaches the preset delay effective threshold range, and if not, the default transparent transmission state is directly recovered to transparently transmit the ethernet data packet, or a fault signal is inserted into the data packet when a fault exists at the far end (CSF alarm). And when the fault time length reaches the range of the preset delay effective threshold value, starting to forcedly insert the fault signal into the data packet, timing the recovery duration time of the transmission line, delaying for a period of time, stopping timing the recovery delay time length if the line fault occurs again in the period of time, and restarting timing when the line recovers. And when the recovery delay reaches the preset recovery threshold range, ending timing, and recovering the default transparent transmission state to transmit the Ethernet packet or inserting a fault signal under the condition that a fault exists at the far end.
Determining whether to enter a recovery delay by presetting a delay validation threshold range can improve link availability because: when a transmission line fault occurs in a transmission protection segment (as shown in fig. 1, the location is marked as "2"), since the time for recovering the fault is generally the time required by optical packet switching to complete protection switching, the fault recovery is fast, the data communication device does not sense the fault on the transmission link during the completion of the protection switching, the jitter of the routing protocol of the data communication device is not caused, the unavailable time of the link is equal to the recovery time of the transmission line, and is usually within 50ms, therefore, the fault recovery does not need to be delayed, transparent transmission can be directly recovered, and the availability of the link can be effectively improved. When there is no transmission protection where a line fault occurs (as shown in fig. 1, the location marked "position 2, and no transmission protection is effective in this location), the interruption time exceeds 50ms, and multiple on-off occurs during the construction of subsequent network repair. If the failure is sensed only by polling the port by the data communication device, multiple interrupt-recoveries are sensed, resulting in a long-term inability to converge on the port routing protocol. According to the embodiment, as the fault duration reaches the preset delay effective threshold range, the delay is effective, the data communication equipment only senses one interruption from the transmission equipment, and senses that the port is normal after the link is completely repaired and the line is normal for a certain time, so that the routing protocol can be guaranteed to complete convergence efficiently, and the port state can be recovered after the transmission is completely repaired.
In another or more embodiments of the present disclosure, an embodiment that increases the availability of the link by interrupting the delay when the transmission line is failed will be described in detail below with reference to fig. 5. In this embodiment, the inserting a failure signal into a data packet transmitted to the data communication device in case of a failure of the transmission line includes: judging whether a far end for sending a data packet has a fault or not under the condition that a transmission line has a fault; if yes, inserting a fault signal into the data packet transmitted to the data communication equipment; if not, inserting a network idle signal into a data packet transmitted to the data communication equipment, and timing the duration time of the transmission line fault to obtain delay interruption duration; and inserting a fault signal into the data packet transmitted to the data communication equipment when the delay interruption time reaches a preset delay interruption threshold range, wherein the preset delay interruption threshold range is determined according to the time required by the estimated optical packet switching to complete the protection switching.
Specifically, fig. 5 shows a processing flow chart of a method for maintaining network stability in a data center according to another embodiment of the present specification, which specifically includes the following steps.
Step 502: when a transmission line is faulty, it is determined whether or not a remote end that transmits a packet has a fault.
Step 504: if so, a fault signal is inserted into the data packet transmitted to the data communication device.
Step 506: and if not, inserting a network idle signal into the data packet transmitted to the data communication equipment, and timing the duration time of the transmission line fault to obtain the delay interruption duration.
Step 508: and inserting a fault signal into the data packet transmitted to the data communication equipment under the condition that the delay interruption duration reaches a preset delay interruption threshold range.
The preset delay interruption threshold range is determined according to the time length required by the estimated optical packet switching to complete the protection switching.
In addition, a preset effective delay threshold range can be obtained by adding a preset time increment on the basis of the preset delay interruption threshold range. For example, the lower limit of the preset delay interruption threshold range is 50ms, and then 10ms is added to the lower limit, so as to obtain the lower limit of the preset delay effective threshold range of 60ms.
Step 510: and inserting a failure signal into the data packet transmitted to the data communication device when the transmission line is restored from the failure to normal.
Step 512: and timing the duration time of the transmission line recovering to normal to obtain the recovery delay time.
Step 514: and when the recovery delay time reaches the preset recovery threshold range, recovering the transparent transmission of the data packet of the transmission line.
In one or more embodiments of the present specification, the step of timing the duration of the transmission line returning to normal and the step of timing the duration of the transmission line failure are performed by using the same timer. By adopting the same timer for timing, the control of the fault delay or the recovery delay is more accurate in time, and the availability of the link is improved.
In the above embodiment, when the transmission line has a fault, the timer is started to start timing, and if the far end has a fault before the transmission line has a fault, the fault signal inserted previously due to the fault of the far end is continuously maintained, otherwise, the network idle signal is inserted. The form of expression of the network IDLE signal is not limited, for example, in the ethernet, an IDLE signal may be used to indicate that the network is IDLE. And when the delay interruption time reaches the range of the preset delay interruption threshold value, starting to insert the fault signal. It can be seen that according to this embodiment, when a transmission line fault occurs within the transmission protection segment (as indicated by the position labeled "2" in fig. 1), the transmission device will send a network idle signal within a delay interruption time period of, for example, 50ms, during which the data communication device is unaware of the fault on the transmission line, the line unavailable time is equal to the transmission link recovery time, which is usually within 50ms, and the CPU resources are not occupied by the data communication device to poll the port status, so as to avoid the problem of long interruption time caused by the fault delay of the transmission line set by the data communication device for the port, and improve the availability of the link.
In one or more embodiments of the present disclosure, an embodiment for increasing the availability of a link by adopting an interruption delay when a failure occurs and determining whether a delay recovery is effective when the failure is recovered to be normal is described in detail below with reference to fig. 6.
Specifically, fig. 6 shows a processing flow chart of a method for maintaining network stability in a data center according to another embodiment of the present specification, which specifically includes the following steps.
Step 602: and responding to the transmission line fault, and timing the fault duration to obtain the fault duration.
Step 604: and judging whether the far end for sending the data packet has a fault or not.
Step 606: if so, a fault signal is inserted into the data packet transmitted to the data communication device, and if the transmission line is restored from the fault to normal, the process proceeds to step 612.
Step 608: and if not, inserting a network idle signal into the data packet transmitted to the data communication equipment, and timing the duration time of the transmission line fault to obtain the delay interruption duration.
Step 610: and inserting a fault signal into the data packet transmitted to the data communication equipment under the condition that the delay interruption duration reaches a preset delay interruption threshold range.
Step 612: and under the condition that the transmission line is recovered to be normal from the fault, stopping timing the fault duration time, and judging whether the fault duration time reaches a preset delay effective threshold range.
Step 614: and if the fault duration reaches the range of the preset delay effective threshold, inserting a fault signal into the data packet transmitted to the data communication equipment.
If the fault duration does not reach the preset delay effectiveness threshold range, go to step 624.
Step 616: and timing the duration time of the transmission line recovering to normal to obtain the recovery delay time.
Step 618: and judging whether the transmission line fails again before the recovery delay time reaches a preset recovery threshold range.
Step 620: if so, stopping timing the duration time of the transmission line returning to normal, and timing the duration time of the transmission line returning to normal again to obtain the recovery delay time when the transmission line returns to normal again after the fault occurs again.
It is understood that, in the case that the recovery delay time length does not reach the preset recovery threshold range, the process will go back to step 618 again to determine whether the fault occurs again, and in the case that the recovery delay time length does not reach the preset recovery threshold range, the process will go to step 622 if the fault does not occur again.
Step 622: if not, when the recovery delay time reaches the preset recovery threshold range, the timing is ended, and step 624 is entered.
Step 624: and judging whether the far end for sending the data packet has a fault or not.
Step 626: and if the fault exists, inserting a fault signal into the data packet.
Step 628: and if the fault does not exist, restoring the transparent transmission of the data packet of the transmission line.
The following describes the processing procedures shown in fig. 5-6 in conjunction with an embodiment in which the port of the data communication device supports a layer 2 interrupt delay function/turn-on delay function. Specifically, for example, in some embodiments, a layer 2 interrupt delay function/turn-on delay function is implemented on the switch.
The layer 2 interrupt delay function on the data communication device means that after a port of the switch receives a physical layer fault, the layer 3 protocol is not immediately turned off, but after a period of time, an upper layer software routing protocol (such as a BGP protocol) is turned off if the physical layer fault still exists. Correspondingly, the layer 2 startup delay function on the data communication device means that after the port of the data communication device receives the physical layer fault recovery, the layer 3 protocol is not immediately started, but after a period of time, if the physical layer is still in a normal state, the upper layer software routing protocol (such as the BGP protocol) is started, so that certain jitter of the physical layer link can be effectively filtered, and the layer 2 link is recovered after the link is stable. Thus, the layer 3 routing protocol does not shut down when a short interruption of the transmission line occurs and resumes. The L2 link recovers faster than the layer 3 routing protocol. Therefore, the data communication device can shorten the recovery time of the whole link by adopting the interruption delay function/the opening delay function, and the recovery time is usually shortened from a minute level to a second level.
Although the implementation of the interrupt delay function/turn-on delay function of layer 2 on the data communication apparatus can suppress short-time interrupts of layer 1 to some extent and the influence on the upper layer routing protocol in the case of continuous multiple interrupts. However, there are some problems: since the data communication device cannot know the location of the failure, for example, it cannot be determined whether the fiber interruption occurs in the machine room or in the external fiber connection crossing the machine room, the interruption delay function/activation delay function performs the interruption delay/activation delay operation for all the links interruption and restoration, and thus the availability of the data communication device port is reduced. In contrast, in the case of transmission line failure and restoration after the processing procedure shown in fig. 6 in conjunction with fig. 5, since the client side of the transmission device implements both functions of interruption delay/restoration delay of layer 1, the problem can be effectively solved.
Specifically, for example, when a line interruption occurs, the client-side start timer starts to count, and when a far-end client-side signal fault is received before the interruption, the previously-inserted fault signal is continuously maintained, otherwise, a network idle signal is inserted and continues for a certain time, and in the case that a preset delay interruption threshold range is reached, the fault signal is started to be inserted. If the transmission line is recovered under the condition that the preset delay interruption threshold range is not reached, the default transparent transmission state is recovered because the interruption time is too short to reach the preset delay effective threshold range. When the line is recovered, the client side firstly judges whether the fault duration reaches the preset delay effective threshold range. And when the fault is not reached, directly recovering the default transparent transmission state, and when the fault is reached, starting to force the client side to insert a fault signal into the data packet, and continuing for a certain time until a preset recovery threshold range is reached, wherein if the line fault occurs again in the period, the timing is stopped, and the timing is restarted when the line is recovered. And when the timing is finished when the preset recovery threshold range is reached, recovering the default transparent transmission state of the client side. The default transparent transmission state comprises direct transparent transmission of a data packet or insertion of a fault signal according to a remote fault.
Because the transmission device has the function of inserting the network idle signal into the data packet when sensing the transmission line fault, when the data communication device receives the network idle signal, the data communication device considers that the received data packet is a normal Ethernet frame, and does not trigger the turn-off of the layer 2 port or trigger the convergence of the routing protocol of the upper layer. Therefore, the method provided by the embodiment of the present specification can avoid the jitter of the routing protocol of the data communication device from being triggered when the optical transmission line is switched, and avoid the fault of the transmission layer from being amplified on the digital link to a certain extent. For an interruption scenario outside the protection switching of the transmission line, for example, when the transmission line is interrupted continuously for multiple times, in combination with the method provided in the embodiment of the present specification, the data communication device only senses one interruption according to the data packet acquired from the transmission device, and senses that the port is normal after the link is completely repaired and the line is normal for a certain time, so that it is ensured that the routing protocol completes convergence efficiently, and the port state can also be recovered after the transmission is completely repaired.
In addition, in the embodiment that supports the layer 2 interrupt delay function/turn-on delay function in conjunction with the data communication device, the delay duration parameters of the interrupt delay function and the turn-on delay function of the data communication device port may be set to zero, or the interrupt delay function and the turn-on delay function may be cancelled. In this embodiment, when a failure occurs on the client side of the transmission device (as indicated by the positions "1" and "3" in fig. 2), since the fiber jump repair or the optical module replacement is relatively fast inside the data center, when such a failure occurs, since the delay duration of the 2-layer interrupt delay function of the data communication port is set to zero or the function is cancelled, the data communication device can quickly sense the failure and recovery of the port, accurately and timely trigger the convergence of the upper layer routing protocol during the failure, and the link is quickly recovered for use without delay during the failure recovery, thereby improving the availability of the link.
In order to make the above processing procedure easier to understand, the processing procedure shown in fig. 6 will be described in detail below with reference to the schematic diagram of the signal interpolation time point sequence shown in fig. 7. In fig. 7, the data communication apparatus and the transmission apparatus correspond to two time point sequences in the same time range, respectively. As shown in fig. 7, the key points in time include:
a time point "a" indicating a time point at which the transmission line is failed;
a time point "b" indicating a time point at which the fault duration reaches a preset delayed effectiveness threshold;
a time point "c" indicating a time point at which the transmission line is restored to normal from the fault;
a time point "d" indicates a time point at which the transmission line fails again in the process of timing the recovery delay time and under the condition that the transmission line does not reach the preset recovery threshold range;
a time point "e" representing a time point at which the transmission line is restored to normal again from the failure;
a time point "f" indicating a time point at which the transmission line fails again;
a time point "g" representing a time point at which the transmission line is restored to normal again from the failure;
the time point "h" represents a time point at which the recovery delay time period reaches the preset recovery threshold range.
According to the above-mentioned time points, the transmission device senses the transmission line fault at the time point marked as "a", and due to the interruption, the transmission device performs maintenance signal insertion, including network idle signal (e.g. IDEL) or fault signal (e.g. LF), on the data packet received from the time point "a" according to steps 602-610, and counts the fault duration. After a certain time, the transmitting device perceives at a point in time identified as "c" that the transmission line has recovered from the fault to normal, i.e. the physical layer outage has recovered and the fault timing has stopped. Since the transmission line is restored from the fault to normal and the fault duration has reached the preset delay effective threshold range at the time point identified as "b", the transmission device determines that the fault duration has reached the preset delay effective threshold range according to step 612, continues to insert the fault signal into the data packet received from the time point "c" according to step 614, and counts the duration of the transmission line restoration from the time point "c" to obtain the restoration delay duration. The recovery delay from the time point "c" corresponds to "10s L1 UP delay" shown in fig. 7, which means that the physical layer performs a recovery delay of 10 seconds, and continues to perform the interpolation of the failure signal such as "LF" for the received packet during this time. Due to the transmission line failing again at time point "d", the transmission device stops timing the duration of the transmission line restoration to normal (i.e., corresponding to the L1 UP delay reset at time point "d" shown in fig. 7) according to steps 618-620. When the transmission line returns to normal again at the time point "e", the transmission device counts the duration of time for which the transmission line returns to normal again (i.e., corresponding to the L1 UP delay reset at the time point "e" shown in fig. 7). At time point "f" the transmission line fails again, the transmission device stops timing the duration of time for which the transmission line is returning to normal according to steps 618-620, and re-times the duration of time for which the transmission line is returning to normal when the transmission line is returning to normal again at time point "g". After a period of time, when the time reaches the time point "h", the transmission device times out according to step 622 because the recovery delay time reaches the preset recovery threshold range, and resumes transparent transmission of the data packet according to steps 624-628. It can be seen that, from time point "a" to time point "h", the transmission device inserts the fault signal into the data packet transmitted to the data communication device.
As can be seen from the above processing procedure and the time point sequence shown in fig. 7, the data communication device only senses one interruption from the time point "a" according to the data packet obtained from the transmission device, and senses that the port is normal only when the link is completely repaired and the line is normal for a certain time and then reaches the time point "h", so that it is possible to ensure that the routing protocol completes convergence efficiently and recover the port state after the transmission is completely repaired. Of course, in the case that the data communication device sets the opening delay function of the layer 2, as shown in fig. 7, the data communication device may also perform the delay reopening of the port and restore the link according to the delay duration set by the opening delay function (e.g., the 2S UP delay function shown in fig. 7) after the time point "h". In the case where the on delay function is cancelled or the delay time period is set to zero, the data communication apparatus immediately resumes the link at the time point "h".
Corresponding to the above method embodiment, the present specification further provides an embodiment of an apparatus for maintaining network stability in a data center, and fig. 8 illustrates a schematic structural diagram of an apparatus for maintaining network stability in a data center, provided in an embodiment of the present specification. The device is configured on the optical transmission equipment. As shown in fig. 8, the apparatus includes:
the fault response module 802 may be configured to insert a fault signal into a data packet transmitted to the data communication device in the event of a fault in the transmission line.
A restoration response module 804, which may be configured to insert a failure signal into a data packet transmitted to the data communication device in case the transmission line is restored from the failure to normal.
The recovery delay timing module 806 may be configured to time a duration of time for which the transmission line is recovered to be normal, to obtain a recovery delay duration.
The recovery transparent transmission module 808 may be configured to recover transparent transmission of the data packet of the transmission line when the recovery delay time length reaches a preset recovery threshold range, where the preset recovery threshold range is determined according to a time length required for estimating that the transmission line completes fault recovery.
According to the device, when an optical transmission device is in fault in a transmission line, a fault signal is inserted into a data packet transmitted to the data communication device, the data communication device is enabled to contain the fault signal according to the data packet, a routing protocol is triggered to close a port corresponding to the transmission line, when the transmission line is recovered to be normal from the fault, the fault signal is inserted into the data packet transmitted to the data communication device, the duration time of the transmission line recovering to be normal is timed, a recovery delay time length is obtained, and when the recovery delay time length reaches a preset recovery threshold value range, transparent transmission of the data packet of the transmission line is recovered, wherein the preset recovery threshold value range is determined according to estimated time length required by the transmission line to complete fault recovery, and the data communication device is enabled to trigger the routing protocol to open the port corresponding to the transmission line if the data packet which does not contain the fault signal is received from the optical transmission device after the port is closed. Therefore, the device can prevent the data communication equipment from consuming CPU resource polling port state, and when the transmission line is continuously subjected to multiple on-off jitter in the recovery process, the optical transmission equipment forcibly inserts fault signals based on the timing of the recovery delay time, so that the delay anti-jitter of fault recovery is realized, for the multiple on-off jitter in the recovery process, the data communication equipment only senses one fault from the optical transmission equipment and senses the fault through the transparent data packet after the link is repaired to be normal for a certain time, the data communication equipment can recover the port state after the transmission is completely repaired, the jitter of the routing protocol of the data communication equipment is avoided, the routing protocol of the data communication equipment is ensured to be efficiently finished and converged, and in addition, the optical transmission equipment can more accurately sense the transmission line fault recovery relative to the data communication equipment, and the packet loss is prevented.
In one or more embodiments of the present description, the apparatus further comprises:
the fault timing module is configured to respond to the transmission line fault and time the fault duration to obtain fault duration;
the delay validation judging module is configured to judge whether the fault duration reaches a preset delay validation threshold range or not under the condition that the transmission line is recovered to be normal from the fault, wherein the preset delay validation threshold range is determined according to the duration required by the estimated optical packet switching to complete the protection switching;
a recovery triggering module configured to trigger the recovery response module 804 to enter the step of inserting a failure signal into the data packet transmitted to the data communication device in the case where the transmission line is recovered from the failure to be normal, if the delay validation judging module judges that the transmission line is true.
In one or more embodiments of the present description, the apparatus further comprises:
the remote judging module is configured to judge whether a fault exists at the remote end for sending the data packet or not if the delayed validation judging module judges that the data packet does not exist at the remote end;
a remote fault handling module configured to insert a fault signal to the data packet if there is a fault;
the recovery transparent transmission module is further configured to recover transparent transmission of the data packet of the transmission line if the remote judgment module determines that no fault exists.
In one or more embodiments of the present specification, the recovery delay timing module is further configured to stop timing the duration of the transmission line returning to normal if the transmission line fails again before the recovery delay time reaches a preset recovery threshold range, and to restart timing the duration of the transmission line returning to normal when the transmission line returns to normal again after the failure again.
In one or more embodiments of the present description, the fault response module includes:
the far-end fault judgment submodule is configured to judge whether a far end for sending the data packet has a fault or not under the condition that the transmission line has the fault;
a far-end fault signal insertion sub-module configured to insert a fault signal into a data packet transmitted to the data communication device if the far-end fault judgment sub-module determines yes;
an idle signal insertion sub-module configured to, if the remote failure judgment sub-module judges no, inserting a network idle signal into a data packet transmitted to the data communication equipment, and timing the duration of the transmission line fault to obtain delay interruption duration;
and the interruption execution submodule is configured to insert a fault signal into the data packet transmitted to the data communication equipment when the delay interruption time reaches a preset delay interruption threshold range, wherein the preset delay interruption threshold range is determined according to the time required by the estimated protection switching completion of the optical packet switching.
In one or more embodiments of the present description, the apparatus further comprises:
and the delay interruption threshold value calculation module is configured to add a preset time increment to the preset delay interruption threshold value range to obtain a preset delay effective threshold value range.
In one or more embodiments of the present disclosure, the apparatus further includes a timer, and the step of timing the duration of the transmission line returning to normal and the step of timing the duration of the transmission line failure are performed by using the same timer.
In one or more embodiments of the present description, the apparatus further comprises:
a fault judging module configured to judge whether the transmission line is faulty according to a fault warning signal received from the transmission line; wherein the fault alert signal comprises: any one or more of an optical signal loss alarm signal, an optical signal frame loss signal, an error code alarm signal and a signal loss lock alarm signal.
In one or more embodiments of the present disclosure, the fault determining module is configured to obtain a forward error correction code value according to an error code warning signal received from the transmission line, and determine that the transmission line has a fault when the forward error correction code value reaches a preset error code threshold range, where the preset error code threshold range is determined according to an error correction capability limit of the forward error correction code.
The foregoing is an exemplary scheme of an apparatus for maintaining network stability in a data center according to this embodiment. It should be noted that the technical solution of the apparatus for maintaining network stability in the data center belongs to the same concept as the technical solution of the method for maintaining network stability in the data center, and details of the technical solution of the apparatus for maintaining network stability in the data center, which are not described in detail, can be referred to the description of the technical solution of the method for maintaining network stability in the data center.
Corresponding to the above method embodiment, the present specification further provides an embodiment of a system for maintaining network stability in a data center, and fig. 9 shows a schematic structural diagram of the system for maintaining network stability in a data center, provided by an embodiment of the present specification. As shown in fig. 9, the system includes: a data communication device 902 and an optical transmission device 904, wherein the data communication device 902 is connected to the optical transmission device 904 to access a network through the optical transmission device 904;
the optical transmission device 904 is configured to insert a fault signal into a data packet transmitted to the data communication device 902 when a transmission line has a fault, insert a fault signal into the data packet transmitted to the data communication device 902 when the transmission line recovers from the fault to be normal, time a duration of the transmission line recovering to be normal, obtain a recovery delay duration, and recover transparent transmission of the data packet of the transmission line when the recovery delay duration reaches a preset recovery threshold range, where the preset recovery threshold range is determined according to a length of time required to estimate the transmission line to complete fault recovery;
the data communication device 902 is configured to receive a data packet from the optical transmission device 904, trigger a routing protocol to close a port corresponding to the transmission line if the data packet contains a fault signal, and trigger the routing protocol to open the port corresponding to the transmission line if a data packet not containing a fault signal is received from the optical transmission device 904 after the port is closed.
In one or more embodiments, the delay duration parameters of the interrupt delay function and the start delay function of the port of the data communication device 902 are set to zero, or the interrupt delay function and the start delay function are cancelled.
The above is an exemplary scheme of the system for maintaining network stability in the data center according to this embodiment. It should be noted that the technical solution of the system for maintaining network stability in a data center belongs to the same concept as the technical solution of the method for maintaining network stability in a data center, and details of the technical solution of the system for maintaining network stability in a data center, which are not described in detail, can be referred to the description of the technical solution of the method for maintaining network stability in a data center.
FIG. 10 illustrates a block diagram of a computing device 1000 provided in accordance with one embodiment of the present description. The components of the computing device 1000 include, but are not limited to, a memory 1010 and a processor 1020. The processor 1020 is coupled to the memory 1010 via a bus 1030 and the database 1050 is used to store data.
Computing device 1000 also includes access device 1040, access device 1040 enabling computing device 1000 to communicate via one or more networks 1060. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 1040 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In one embodiment of the present description, the above-described components of computing device 1000 and other components not shown in FIG. 10 may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 10 is for purposes of example only and is not limiting as to the scope of the present description. Those skilled in the art may add or replace other components as desired.
Computing device 1000 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), a mobile phone (e.g., smartphone), a wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 1000 may also be a mobile or stationary server.
Wherein the processor 1020 is configured to execute computer-executable instructions that, when executed by the processor, implement the steps of the above-described method for data center maintenance network stabilization.
The foregoing is a schematic diagram of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the method for maintaining network stability of the data center belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the method for maintaining network stability of the data center.
An embodiment of the present specification further provides a computer-readable storage medium storing computer-executable instructions, which when executed by a processor, implement the steps of the method for maintaining network stability of a data center.
The above is an illustrative scheme of a computer-readable storage medium of the embodiment. It should be noted that the technical solution of the storage medium and the technical solution of the method for maintaining network stability in the data center belong to the same concept, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the method for maintaining network stability in the data center.
An embodiment of the present specification further provides a computer program, where the computer program is executed in a computer, and the computer is caused to execute the steps of the method for maintaining network stability of the data center.
The above is a schematic scheme of a computer program of the present embodiment. It should be noted that the technical solution of the computer program and the technical solution of the above method for maintaining network stability of the data center belong to the same concept, and details that are not described in detail in the technical solution of the computer program can be referred to the description of the technical solution of the above method for maintaining network stability of the data center.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts, but those skilled in the art should understand that the present embodiment is not limited by the described acts, because some steps may be performed in other sequences or simultaneously according to the present embodiment. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for an embodiment of the specification.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the embodiments. The specification is limited only by the claims and their full scope and equivalents.

Claims (14)

1. A system for a data center to maintain network stability, comprising: the data communication equipment is connected with the optical transmission equipment so as to access a network through the optical transmission equipment;
the optical transmission device is configured to insert a fault signal into a data packet transmitted to the data communication device when a transmission line fails, insert a fault signal into the data packet transmitted to the data communication device when the transmission line is restored from a fault to a normal state, time the duration of the transmission line restoration to the normal state to obtain a restoration delay time, and restore transparent transmission of the data packet to the transmission line when the restoration delay time reaches a preset restoration threshold range, wherein the preset restoration threshold range is determined according to a time required for predicting the transmission line to complete the fault restoration;
the data communication device is configured to receive a data packet from the optical transmission device, trigger a routing protocol to close a port corresponding to the transmission line if the data packet contains a fault signal, and trigger the routing protocol to open the port corresponding to the transmission line if a data packet not containing the fault signal is received from the optical transmission device after the port is closed.
2. The system of claim 1, wherein the delay duration parameters of the interrupt delay function and the on delay function of the data communication device port are set to zero or the interrupt delay function and the on delay function are cancelled.
3. A method for maintaining network stability of a data center is applied to optical transmission equipment and comprises the following steps:
inserting a fault signal into a data packet transmitted to the data communication device in case of a fault occurring in the transmission line;
inserting a failure signal into a data packet transmitted to the data communication device in a case where the transmission line is restored from a failure to normal;
timing the duration time of the transmission line for recovering to normal to obtain recovery delay time;
and when the recovery delay time reaches a preset recovery threshold range, recovering transparent transmission of the data packet of the transmission line, wherein the preset recovery threshold range is determined according to the estimated time required by the transmission line to complete fault recovery.
4. The method of claim 3, further comprising:
responding to the transmission line fault, timing fault duration to obtain fault duration;
under the condition that the transmission line is recovered to be normal from the fault, judging whether the fault duration reaches a preset delay effective threshold range, wherein the preset delay effective threshold range is determined according to the duration required by the estimated optical packet switching to finish protection switching;
if so, the method proceeds to the step of inserting a fault signal into the data packet transmitted to the data communication device when the transmission line is recovered from the fault to normal.
5. The method of claim 4, further comprising:
if the fault duration does not reach the preset delay effective threshold range, judging whether a fault exists at the far end for sending the data packet;
if the fault exists, inserting a fault signal into the data packet;
and if the fault does not exist, restoring the transparent transmission of the data packet of the transmission line.
6. The method of claim 3, further comprising:
if the transmission line fails again before the recovery delay time reaches the preset recovery threshold range, stopping timing the duration time of the transmission line recovering to normal;
and when the transmission line returns to be normal again after the secondary fault, timing the duration of the transmission line returning to be normal again.
7. The method according to claim 3 or 4, wherein the inserting of the fault signal into the data packet transmitted to the data communication device in case of a fault of the transmission line comprises:
judging whether a far end for sending a data packet has a fault or not under the condition that a transmission line has a fault;
if yes, inserting a fault signal into the data packet transmitted to the data communication equipment;
if not, inserting a network idle signal into a data packet transmitted to the data communication equipment, and timing the duration time of the transmission line fault to obtain delay interruption duration;
and inserting a fault signal into the data packet transmitted to the data communication equipment when the delay interruption time reaches a preset delay interruption threshold range, wherein the preset delay interruption threshold range is determined according to the estimated time required by the completion of protection switching of the optical packet switching.
8. The method of claim 7, further comprising:
and on the basis of the preset delay interruption threshold range, adding a preset time increment to obtain a preset delay effective threshold range.
9. The method of claim 7, wherein the step of timing the duration of the transmission line restoration to normal is performed using the same timer as the step of timing the duration of the transmission line failure.
10. The method of claim 3, further comprising:
judging whether the transmission line has a fault according to a fault alarm signal received from the transmission line;
wherein the fault alert signal comprises: any one or more of an optical signal loss alarm signal, an optical signal frame loss signal, an error code alarm signal and a signal loss lock alarm signal.
11. The method of claim 10, wherein said determining whether the transmission line is faulty based on a fault alert signal received from the transmission line comprises:
acquiring a forward error correction code value according to an error code warning signal received from the transmission line;
and determining that the transmission line has a fault under the condition that the forward error correction code value reaches a preset error code threshold range, wherein the preset error code threshold range is determined according to the error correction capability limit of the forward error correction code.
12. An apparatus for maintaining network stability in a data center, configured in an optical transmission device, includes:
a fault response module configured to insert a fault signal into a data packet transmitted to the data communication device in the case where the transmission line has a fault;
a recovery response module configured to insert a failure signal into a data packet transmitted to the data communication device in a case where the transmission line is recovered from a failure to be normal;
a recovery delay timing module configured to time a duration of time for which the transmission line recovers to a normal state, to obtain a recovery delay duration;
and the recovery transparent transmission module is configured to recover transparent transmission of the data packet of the transmission line under the condition that the recovery delay time reaches a preset recovery threshold range, wherein the preset recovery threshold range is determined according to the estimated time required by the transmission line to complete fault recovery.
13. A computing device, comprising:
a memory and a processor;
the memory is configured to store computer-executable instructions, and the processor is configured to execute the computer-executable instructions, which when executed by the processor, implement the steps of the method for data center maintenance network stabilization according to any one of claims 3 to 11.
14. A computer readable storage medium storing computer executable instructions which, when executed by a processor, perform the steps of the method of a data center maintaining network stability of any one of claims 3 to 11.
CN202210892208.3A 2022-07-27 2022-07-27 System, method and device for maintaining network stability of data center Pending CN115396308A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210892208.3A CN115396308A (en) 2022-07-27 2022-07-27 System, method and device for maintaining network stability of data center

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210892208.3A CN115396308A (en) 2022-07-27 2022-07-27 System, method and device for maintaining network stability of data center

Publications (1)

Publication Number Publication Date
CN115396308A true CN115396308A (en) 2022-11-25

Family

ID=84117436

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210892208.3A Pending CN115396308A (en) 2022-07-27 2022-07-27 System, method and device for maintaining network stability of data center

Country Status (1)

Country Link
CN (1) CN115396308A (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090222687A1 (en) * 2008-03-03 2009-09-03 Nortel Networks Limited Method and system for telecommunication apparatus fast fault notification
CN102291245A (en) * 2010-06-21 2011-12-21 中兴通讯股份有限公司 Method for detecting mismatch fault and maintenance terminal point
US20110320857A1 (en) * 2010-06-23 2011-12-29 Electronics And Telecommunications Research Institute Bottom-up multilayer network recovery method based on root-cause analysis
US20120099689A1 (en) * 2009-06-26 2012-04-26 Telefonaktiebolaget L M Ericsson (Publ) Detection of Jitter in a Communication Network
US20130343179A1 (en) * 2011-03-07 2013-12-26 Tejas Networks Ltd Ethernet chain protection switching
KR101642440B1 (en) * 2016-03-15 2016-07-25 라이트웍스 주식회사 Network recovering method for ring network
CN106209456A (en) * 2016-07-13 2016-12-07 浪潮(北京)电子信息产业有限公司 A kind of kernel state lower network fault detection method and device
CN107294767A (en) * 2017-05-05 2017-10-24 中广热点云科技有限公司 A kind of Living Network transmission fault monitoring method and system
CN110224916A (en) * 2018-03-01 2019-09-10 中兴通讯股份有限公司 The processing method of message, the packaging method of device and message, device and system
WO2020134928A1 (en) * 2018-12-28 2020-07-02 中兴通讯股份有限公司 Method and device for processing reverse service path failure communication
CN113660033A (en) * 2021-08-31 2021-11-16 烽火通信科技股份有限公司 Method and system for rapidly recovering instantaneous interruption of high-speed interface

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090222687A1 (en) * 2008-03-03 2009-09-03 Nortel Networks Limited Method and system for telecommunication apparatus fast fault notification
US20120099689A1 (en) * 2009-06-26 2012-04-26 Telefonaktiebolaget L M Ericsson (Publ) Detection of Jitter in a Communication Network
CN102291245A (en) * 2010-06-21 2011-12-21 中兴通讯股份有限公司 Method for detecting mismatch fault and maintenance terminal point
US20130088976A1 (en) * 2010-06-21 2013-04-11 Zte Corporation Method for Detecting Mismatch Fault and Maintenance Endpoint
US20110320857A1 (en) * 2010-06-23 2011-12-29 Electronics And Telecommunications Research Institute Bottom-up multilayer network recovery method based on root-cause analysis
US20130343179A1 (en) * 2011-03-07 2013-12-26 Tejas Networks Ltd Ethernet chain protection switching
KR101642440B1 (en) * 2016-03-15 2016-07-25 라이트웍스 주식회사 Network recovering method for ring network
CN106209456A (en) * 2016-07-13 2016-12-07 浪潮(北京)电子信息产业有限公司 A kind of kernel state lower network fault detection method and device
CN107294767A (en) * 2017-05-05 2017-10-24 中广热点云科技有限公司 A kind of Living Network transmission fault monitoring method and system
CN110224916A (en) * 2018-03-01 2019-09-10 中兴通讯股份有限公司 The processing method of message, the packaging method of device and message, device and system
WO2020134928A1 (en) * 2018-12-28 2020-07-02 中兴通讯股份有限公司 Method and device for processing reverse service path failure communication
CN113660033A (en) * 2021-08-31 2021-11-16 烽火通信科技股份有限公司 Method and system for rapidly recovering instantaneous interruption of high-speed interface

Similar Documents

Publication Publication Date Title
CN110149220B (en) Method and device for managing data transmission channel
RU2466505C2 (en) Method, device and system of communication for protection of alarm transfer
EP2510651B1 (en) Connectivity fault management timeout period control
CN107612754B (en) Bidirectional forwarding link fault detection method and device and network node equipment
US20140341042A1 (en) Conditional Routing Technique
WO2011100882A1 (en) Link detecting method, apparatus and system
WO2009082923A1 (en) Link fault processing method and data forwarding device
US11245615B2 (en) Method for determining link state, and device
CN106878072B (en) Message transmission method and device
JP6308534B2 (en) Network protection method, network protection device, off-ring node, and system
CN107547368B (en) BFD session switching method, device and storage medium
US8775869B2 (en) Device and method for coordinating automatic protection switching operation and recovery operation
CN111585797B (en) Ethernet link switching method, device, equipment and computer readable storage medium
CN112383414B (en) Dual-machine hot backup quick switching method and device
CN104994173A (en) Message processing method and system
CN113114563B (en) Traffic back-off delay method, device and storage medium based on segmented routing strategy
US7860090B2 (en) Method for processing LMP packets, LMP packet processing unit and LMP packet processing node
CN107872822B (en) Service bearing method and device
CN115396308A (en) System, method and device for maintaining network stability of data center
CN103840965A (en) Method for enhancing quick fault convergence in RSTP
EP3355530A1 (en) Method, apparatus and device for processing service failure
CN116133004A (en) Link detection method, device, network equipment and network element node
CN111224803A (en) Multi-master detection method in stacking system and stacking system
EP4354826A1 (en) Link processing method and apparatus, network device, and storage medium
CN115801555B (en) Main-standby switching method and device based on preemption delay and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination