CN108270669B - Service recovery device, main controller, system and method of SDN network - Google Patents

Service recovery device, main controller, system and method of SDN network Download PDF

Info

Publication number
CN108270669B
CN108270669B CN201611252266.0A CN201611252266A CN108270669B CN 108270669 B CN108270669 B CN 108270669B CN 201611252266 A CN201611252266 A CN 201611252266A CN 108270669 B CN108270669 B CN 108270669B
Authority
CN
China
Prior art keywords
controller
link
switch
main controller
fault
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611252266.0A
Other languages
Chinese (zh)
Other versions
CN108270669A (en
Inventor
柯志勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201611252266.0A priority Critical patent/CN108270669B/en
Publication of CN108270669A publication Critical patent/CN108270669A/en
Application granted granted Critical
Publication of CN108270669B publication Critical patent/CN108270669B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/15Interconnection of switching modules
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/28Routing or path finding of packets in data switching networks using route fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/55Prevention, detection or correction of errors
    • H04L49/557Error correction, e.g. fault recovery or fault tolerance

Abstract

The invention discloses a service recovery device, a main controller, a system and a method of an SDN network, wherein the device comprises a state detection module, a state detection module and a state control module, wherein the state detection module is used for monitoring the running states of all controllers of the SDN network and the connection states of all switch links; the all controllers comprise a main controller and a plurality of backup controllers; the controller management module is used for maintaining data synchronization between the main controller and each backup controller; if the main controller fails, selecting a backup controller as a new main controller, and configuring the new main controller according to various configuration information of the main controller; and the link management module is used for recovering the link failure of the switch according to the failure attribute if the link failure of the switch occurs. The invention effectively ensures the reliability of the SDN and realizes the automatic recovery of the SDN system.

Description

Service recovery device, main controller, system and method of SDN network
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a service recovery device, a host controller, a system, and a method for an SDN network.
Background
In a conventional distributed IP network, control logic and data forwarding functions are implemented on network devices, and the network devices need to implement intellectualization of the entire network under the control of a large number of distributed protocols. This results in a network control plane that is too complex, and flexibility and scalability are difficult to adapt to the rapid development of the network.
An SDN (Software Defined Network) is an open Network architecture, and is mainly characterized by centralized control and Network programmability, which allows a Network administrator to manage and operate the entire Network in a Software programming manner. The SDN separates a logic control function from a data forwarding function, the logic control function of a network is realized by a network controller based on software, and a bottom layer network device only needs to be responsible for realizing the simple data forwarding function and interacts with the network controller through an OpenFlow protocol. The OpenFlow forum, which originated from the "Clean Slate" project at stanford university, is an emerging organization that was established in 2008.
An SDN network architecture is shown in fig. 1 and mainly includes, from top to bottom, an application layer, a control layer, and a data forwarding layer. The core of the architecture is concentrated on a control layer based on a network operating system, and the main device is an SDN controller which has a global view of the whole network. The data forwarding layer is mainly a data forwarding device of the bottom layer, such as an OpenFlow switch. The data forwarding device is stripped of the control function, and only the data flow needs to be matched and forwarded according to the flow table. The data forwarding device performs information interaction with the control layer through a southbound interface (currently, OpenFlow protocol is mainly used), and completes issuing of a controller data flow table and feedback of data information of bottom-layer devices. The control layer provides a northbound interface upwards to perform information transfer with the application layer, and the SDN application and service perform corresponding operation on the network by calling the provided northbound interface to realize corresponding functions.
As a new technology, the SDN network also faces unstable service conditions, such as network connection disconnection caused by sudden failure, network congestion, network attack, and the like, and may also be subjected to SDN network paralysis caused by hanging up of a controller for various reasons, and the interruption of the network may bring great loss to users, so that it is an urgent need for SDN network application to ensure the reliability and availability of the SDN network.
Disclosure of Invention
In order to overcome the defects of the prior art, the technical problem to be solved by the present invention is to provide an apparatus, a host controller, a system and a method for ensuring the reliability of an SDN network.
In order to solve the above technical problem, a service recovery apparatus for an SDN network in the present invention includes:
the state detection module is used for monitoring the running states of all controllers of the SDN network and the connection states of all switch links; the all controllers comprise a main controller and a plurality of backup controllers;
the controller management module is used for maintaining data synchronization between the main controller and each backup controller; if the main controller fails, selecting a backup controller as a new main controller, and configuring the new main controller according to various configuration information of the main controller;
and the link management module is used for recovering the link failure of the switch according to the failure attribute if the link failure of the switch occurs.
In order to solve the above technical problem, a master controller in an SDN network in the present invention includes:
constructing topological structures of all switches in the SDN network;
and according to the topological structure, if the two switches are not in the intercommunication state, the switch link fault is detected in a preset detection mode.
In order to solve the above technical problem, a service recovery system for an SDN network in the present invention includes any one of the above devices and any one of the above dell host controllers.
In order to solve the technical problem, a service recovery method for an SDN network in the present invention includes:
monitoring the running states of all controllers of the SDN network and the connection states of all switch links; the all controllers comprise a main controller and a plurality of backup controllers;
maintaining data synchronization between the primary controller and each backup controller; if the main controller fails, selecting a backup controller as a new main controller, and configuring the new main controller according to various configuration information of the original main controller;
and if one switch link fails, performing switch link failure recovery according to the failure attribute.
In order to solve the technical problem, a method for detecting a switch link fault in an SDN network in the present invention includes:
the main controller constructs the topological structures of all the switches;
and the main controller detects the switch link fault in a preset detection mode according to the topological structure if finding that any two switches are not in an intercommunication state.
The invention has the following beneficial effects:
the system and the method effectively ensure the reliability of the SDN and realize the automatic recovery of the SDN.
Drawings
Figure 1 is a schematic diagram of an SDN network architecture in an embodiment of the invention;
fig. 2 is a schematic diagram of a service recovery system architecture of an SDN network in an embodiment of the present invention;
figure 3 is a general flow diagram of SDN network fault discovery in an embodiment of the invention;
fig. 4 is a schematic structural diagram of a fault recovery system of an SDN network controller according to an embodiment of the present invention;
fig. 5 is a flowchart illustrating a fault recovery process of an SDN network controller according to an embodiment of the present invention;
fig. 6 is a flow chart of SDN link failure recovery in an embodiment of the present invention;
fig. 7 is a schematic diagram of an SDN network link discovery process in an embodiment of the present invention.
Detailed Description
In order to ensure the reliability of the SDN network, the present invention provides an apparatus, a host controller, a system, and a method, and the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
A first embodiment of the present invention provides a service restoration device for an SDN network, where the device includes:
the state detection module is used for monitoring the running states of all controllers of the SDN network and the connection states of all switch links; the all controllers comprise a main controller and a plurality of backup controllers;
the controller management module is used for maintaining data synchronization between the main controller and each backup controller; if the main controller fails, selecting a backup controller as a new main controller, and configuring the new main controller according to various configuration information of the main controller;
and the link management module is used for recovering the switch link failure according to the failure attribute if one switch link fails.
The system in the embodiment of the invention effectively ensures the reliability of the SDN and realizes the automatic recovery of the SDN.
Meanwhile, the embodiment of the invention keeps the backup controller to be synchronous with the data of the main controller at all times, so that if the main controller and the backup controller are switched, the service can be recovered in the fastest time.
Each module of the service recovery device in the embodiment of the present invention has the same function as the service recovery system of the SDN network described below, and may refer to each other in specific implementation.
On the basis of the above-described embodiment, a modified embodiment of the above-described embodiment is further proposed, and it is to be noted herein that, in order to make the description brief, only the differences from the above-described embodiment are described in each modified embodiment.
In one embodiment of the invention, the apparatus further comprises:
the system deployment module is used for deploying a plurality of controllers in the SDN network in advance and connecting the controllers with all switches in the same area respectively;
one of the controllers is selected as the primary controller and the remaining controllers are selected as the plurality of backup controllers.
In another embodiment of the present invention, the controller management module is further configured to determine that the master controller fails through polling detection, active reporting by the controller, or socket communication.
In another embodiment of the present invention, the controller management module is further configured to upgrade one or more backup controllers;
and selecting one upgraded backup controller as a new main controller, and configuring the new main controller according to various configuration information of the original main controller.
In yet another embodiment of the present invention, the link management module includes:
the fault detection module is used for judging the fault attribute if a switch link fails;
and the link recovery module is used for recovering the switch link fault according to the preset priority recovery mode corresponding to the fault attribute.
Further, the fault detection module is specifically configured to determine a first setting standard corresponding to a requirement of a service flow affected by the fault on quality of service guarantee, and determine a second setting standard corresponding to a requirement of the service flow on fault recovery response time;
the link recovery module is specifically configured to switch the failed switch link to the backup switch link when the first setting criterion belongs to a preset low quality threshold range and the second setting criterion belongs to a preset high response threshold range;
when the first set standard belongs to a preset high-quality threshold range and the second set standard belongs to a preset low-response threshold range, calculating an optimal switch link and switching the failed switch link to the optimal switch link;
and when the first setting standard belongs to a preset high-quality threshold range and the second setting standard belongs to a preset high-response threshold range, firstly switching the failed switch link to the backup switch link, calculating an optimal switch link, and then switching the switched backup switch link to the optimal switch link.
Wherein the link management module further comprises:
a link discovery module, configured to send a topology structure instruction to the master controller, so that the master controller constructs topology structures of all switches;
the fault detection module is specifically configured to obtain the topology structure from the master controller, and perform switch link fault detection according to the topology structure.
The invention further provides a main controller in the SDN network, comprising:
constructing topological structures of all switches in the SDN network;
and according to the topological structure, if the two switches are not in the intercommunication state, the switch link fault is detected in a preset detection mode.
In an embodiment of the present invention, the constructing the topology of all switches includes:
the master controller sends first messages carrying link discovery protocols to all the switches so that all the switches respond to second messages;
and constructing the topological structure of all the switches according to each second message of the response.
Wherein, the detection of the switch link fault through a preset detection mode comprises:
the main controller acquires port state information of each switch and judges port faults of each switch according to the port state information; and/or the presence of a gas in the gas,
the main controller acquires a forwarding flow table of each switch, and judges the blockage fault of each link according to the forwarding flow table information; and/or the presence of a gas in the gas,
the main controller sends a grouping message instruction to all the switches so that each switch sends a broadcast grouping message; when receiving a packet message responded by two switches which are not in an intercommunication state, judging that the two switches are non-OpenFlow switches and do not belong to a link open circuit fault; when the packet message responded by the two switches is not received, the link between the two switches is determined to be the link disconnection fault.
The present invention further provides a service recovery system of an SDN network, where the system includes any one of the service recovery devices in the foregoing embodiments and any one of the main controllers in the foregoing embodiments.
Specifically, as shown in fig. 2, a first embodiment of the present invention provides a service recovery system for an SDN network, where the system includes a plurality of controllers, a plurality of switches, a controller supervision layer (corresponding to a service recovery device of the SDN network in the foregoing embodiment) and a control detection layer (corresponding to a main controller in the foregoing embodiment); the control monitor layer includes:
the state detection module is used for monitoring the running states of all controllers of the SDN network; the all controllers comprise a main controller and a plurality of backup controllers;
the controller management module is used for maintaining data synchronization between the main controller and each backup controller;
if the main controller fails, selecting a backup controller as a new main controller, and configuring the new main controller according to various configuration information of the original main controller;
and the control detection layer is used for carrying out switch link fault recovery according to the fault attribute if a switch link fails.
That is to say, in the embodiment of the present invention, a controller monitoring layer is deployed in an SDN network, and multiple controllers are deployed, where the controller monitoring layer selects one of the controllers as a master controller, and the controller monitors and discovers a fault of the controller by polling detection, active reporting by the controller, or socket communication, and reports the fault to an Application layer, and issues a streaming rule to each controller through various API interfaces (e.g., REST-API (Representational State Transfer-Application Interface)) and uploads a network event to the Application layer at the same time. Such as a controller failure event, a controller status query, for example. When the main controller encounters a fault, one backup controller is reselected to operate as the main controller. The controller and the switch can be communicated but not limited to an OpenFlow protocol, and various faults existing in a link, including optical fiber faults, switch faults and the like, can be discovered in various modes such as real-time fault reporting of the switch, polling detection of the controller and the like. When a control fault is found, the controller reports the fault to the application layer on one hand, and on the other hand, the controller reselects a backup controller as a main controller, and copies various configuration information of the original main controller, such as topology information, switch information, log information, database connection information and the like, to a new main controller, so as to realize the recovery of the controller fault.
The system in the embodiment of the invention effectively ensures the reliability of the SDN and realizes the automatic recovery of the SDN.
Meanwhile, the embodiment of the invention keeps the backup controller to be synchronous with the data of the main controller at all times, so that if the main controller and the backup controller are switched, the service can be recovered in the fastest time.
In one embodiment of the present invention, the control and supervision layer further comprises:
the system deployment module is used for deploying a plurality of controllers in the SDN network in advance and connecting the controllers with all switches in the same area respectively;
one of the controllers is selected as the primary controller and the remaining controllers are selected as the plurality of backup controllers.
The controller management module is further used for judging that the main controller fails in a polling detection mode, a controller active reporting mode or a socket communication mode.
In another embodiment of the present invention, the controller management module is further configured to upgrade one or more backup controllers;
and selecting an upgraded backup controller as a new main controller, and configuring the new main controller according to various configuration information of the original main controller, namely copying various configuration information of the original main controller to the new main controller.
The embodiment of the invention can independently upgrade the backup controller, and if the user actively requires to switch to the backup controller, the main controller can not influence the switch service even if being upgraded.
In yet another embodiment of the present invention, the control detection layer includes:
the fault detection module is used for judging the fault attribute if a switch link fails;
and the link recovery module is used for recovering the switch link failure according to the preset priority recovery mode corresponding to the failure attribute.
According to the embodiment of the invention, different service flows are transmitted in the SDN network, and the service attribute needs to be considered when fault recovery is executed, so that when a fault occurs, the priority level can be set according to the fault attribute, and the quality of service is ensured.
Furthermore, the fault detection module is specifically configured to determine a first setting standard corresponding to a requirement of a service flow affected by the fault on quality of service assurance, and determine a second setting standard corresponding to a requirement of the service flow on fault recovery response time;
the link recovery module is specifically configured to switch the failed switch link to the backup switch link when the first setting criterion belongs to a preset low quality threshold range and the second setting criterion belongs to a preset high response threshold range; namely a first priority recovery mode;
when the first set standard belongs to a preset high-quality threshold range and the second set standard belongs to a preset low-response threshold range, calculating an optimal switch link and switching the failed switch link to the optimal switch link; namely the second priority recovery mode;
when the first setting standard belongs to a preset high-quality threshold range and the second setting standard belongs to a preset high-response threshold range, firstly switching the failed switch link to a backup switch link, calculating an optimal switch link, and switching the switched backup switch link to the optimal switch link; i.e. the third priority recovery mode.
Further, the control detection layer further includes:
a link discovery module, configured to send a topology structure instruction to the master controller, so that the master controller constructs topology structures of all switches;
the fault detection module is specifically configured to obtain the topology from the master controller, and perform switch link fault detection according to the topology
The master controller or the new master controller constructs a topology of all switches, including:
the master controller or the new master controller sends first messages carrying link discovery protocols to all the switches so that all the switches respond to second messages; and the number of the first and second groups,
and constructing the topology of all the switches according to each second message of the response.
Wherein the switch link failure comprises a switch port failure, a link blocking failure and a link open-circuit failure; the detecting the switch link fault according to the topology structure includes:
according to the topological structure, if the two switches are not in the intercommunication state, the switch link fault is detected in a preset detection mode.
The method for detecting the link fault of the switch by the preset detection mode comprises the following steps:
the detection method comprises the following steps: judging the port fault of each switch according to the port state information of each switch;
and a second detection mode: judging the blockage fault of each link according to the forwarding flow table of each switch;
and a third detection mode: sending instructions to all switches to cause each switch to send a broadcast packet message;
when the main controller or the new main controller receives a packet message responded by two switches which are not in an intercommunication state, the two switches are judged to be non-OpenFlow switches and do not belong to a link open circuit fault;
when the packet message responded by the two switches is not received, the link between the two switches is determined to be the link disconnection fault.
Regarding Link failure discovery, in the prior art, a Link Management Protocol (LMP) Protocol is used on a Link port to discover devices connected to an opposite end, and periodically send a hello packet to the opposite end, and if the opposite end fails, no hello packet is returned, so as to determine whether an opposite end Link is disconnected. But cannot know whether it is a non-OpenFlow domain.
In the embodiment of the present invention, an SDN controller (i.e., a master controller or a new master controller) mainly uses a Link Layer Discovery Protocol (LLDP) as a Link Discovery Protocol, and may organize Information such as a main capability, a Management address, a device identifier, and an interface identifier of a local device into different TLVs (Type/Length/Value), encapsulate the TLVs into an LLDPDU (Link Layer Discovery Protocol data Unit), and issue the Information to its own directly connected neighbor, and after receiving the Information, the neighbor stores the Information in a form of a standard MIB (Management Information Base) for a network Management system query machine to determine a communication status of a Link.
As shown in fig. 7, in the embodiment of the present invention, when an SDN controller executes a Link Discovery process, a Packet-out message carrying LLDP (Link Layer Discovery Protocol) information is sent to an OpenFlow switch connected to the SDN controller, the SDN controller instructs the switch to send an LLDP Packet to all ports directly connected to the switch, the switch receives the LLDP Packet and sends the LLDP Packet to a device connected to the switch through all ports of the switch, if a device adjacent to the OpenFlow switch is also an OpenFlow switch, because the switch does not have a special flow entry for processing the LLDP message, the switch sends the Packet to the controller through the Packet-in message, and after receiving the Packet-in message, the controller analyzes the Packet and creates a Link record between two switches in a Link Discovery table stored in the controller, after collecting many Link information in its own management area, the topology of the network can be constructed from this information.
If the switches controlled by the controller are found not to be intercommunicated with each other, a command is issued to all the switches controlled by the controller, the switches are required to send broadcast packet messages to other ports except the ports connected with the switches and the controller, if a non-OpenFlow switch exists in the network, the broadcast packet enters from one end of the network and passes through to reach other switches connected with the non-OpenFlow switch, and the switch receiving the broadcast packet does not have a corresponding flow table item for matching the broadcast packet, so the broadcast packet uploads an exception to the controller, and the controller is informed that a non-OpenFlow domain exists in the network. And if the controller does not receive the broadcast packet uploaded by the OpenFlow switches, the corresponding two OpenFlow switches belong to a disconnection relation, so that the link failure is discovered.
The controller detects various link failure causes in the link through different methods, such as switch port failure, flow table entry error, link overload, large amount of packet loss caused by link disconnection, and the like. First, when a forwarding port of a switch fails, a state of the port changes, the switch sends a message (e.g., a port-status message) to a controller about the state change, and the controller can determine that a certain forwarding port fails according to the message. Secondly, each switch stores forwarding flow tables of all switches, and forwarding behaviors of all switches can be detected at the same time. The switch can judge whether a certain section of link is blocked or not by detecting and analyzing the issued flow table, and when a link fault caused by a logic error of the flow table occurs, the controller can quickly repair the error of the flow table and solve the fault. Thirdly, for the condition of link disconnection, the controller continuously detects the state of the switch and updates the network topology according to the packet-in message fed back by regularly sending the packet-out message carrying the LLDP information. And a global resource view of the whole network is formed in the controller so as to realize the centralized management and control function of the network.
When the link failure is recovered, the controller divides the failure into a first priority recovery mode, a second priority recovery mode and a third priority recovery mode according to the failure type, the emergency and the failure grade, so that the optimized automatic recovery of the service in the SDN network is realized.
In the first priority recovery mode, if the data stream affected by the fault has not high requirements on quality of service assurance and possibly has high requirements on fault recovery response time, the data stream is directly switched to the backup path for recovery, but the main path is generally the optimal path, and when the main path fails, the backup path is not necessarily the optimal path of the current environment. This selection is also the most common way for current SDN link recovery usage.
In the second priority recovery mode, if the data flow affected by the fault has high requirement on the service quality assurance but the recovery response time is not high, an optimal path can be recalculated and then the switch switches to a new optimal path.
In the third preferred recovery mode, if the data stream affected by the failure has a high requirement on the quality of service assurance and a high requirement on the recovery response time, for example, a real-time high-bandwidth network data stream service has a high requirement on delay and packet loss rate, the switch is directly switched to the backup path in a parallel execution mode, even if the service is switched, a new optimal path is calculated, and then the data stream is switched from the backup path to the new optimal path.
For the situation that the main path and the backup path have a lot of faults, only one path can be recalculated and issued to the affected switch, and the controller can calculate that a plurality of backup paths are issued to the switch for processing the situation that the faults are more, but a large amount of storage space of the switch is occupied and the load of the switch is increased, so that the situation that the second priority recovery mode is adopted in the situation is more.
Illustrating a system according to an embodiment of the present invention.
The embodiment of the invention mainly aims at the link fault discovery, the link fault recovery, the controller fault discovery and the controller fault recovery in the SDN network. A controller monitoring layer is added in a traditional SDN network, the controller monitors the faults found by a polling detection mode, reports the faults to an application layer, issues flow rules to each controller through an API (application programming interface), such as an REST-API (representational State transfer-application programming interface), uploads network events such as controller fault events and controller state query to the application layer, and when the controller encounters the faults, a controller is selected again to operate as a main controller. The controller and the switch are communicated through an OpenFlow protocol, and various faults in the link, including optical fiber faults, switch faults and the like, are discovered through two modes, namely fault real-time reporting of the switch and controller polling detection. When a control fault is found, the controller reports the fault to the application layer on one hand, and on the other hand, a backup controller is reselected as a main controller, and various configuration information of the original controller, such as topology information, switch information, log information and the like, is copied to the new main controller, so that the fault recovery of the controller is realized.
The service quality of the service flow during the link failure period is ensured by detecting various link failures in the link at the controller layer and the data forwarding layer, setting failure recovery modes for various link failures and setting different failure recovery modes according to different service attributes.
In a specific implementation process, a schematic diagram of a system structure for automatically recovering an SDN network service is shown in fig. 2, and is divided into 5 layers: an application service layer 1, an interface monitoring layer 2, a control monitoring layer 3, a control detection layer 4 and a data forwarding layer 5.
The application service layer comprises an application for managing a network to be accessed and an application accessed to the network;
and the interface monitoring layer is used for displaying the connection state of the controller and the link in the network in real time, obtaining data from the application service layer, displaying alarm information, displaying a recovery path, setting and selecting a providing window for a user part, and sending a setting result to the application layer. If the link fails, displaying abnormal connection or disconnection, and displaying the path after the system automatically recovers.
And the controller supervision layer is used for setting controller supervision in the control supervision layer, and the controller supervision is composed of one or more independent servers (or equipment) and software running on the servers (or equipment).
The controller supervises the following functions: 1) managing the controllers connected with the controllers, selecting one controller as a main controller, and using the other controllers connected with the controllers as backup controllers, wherein the main controller is the controller currently controlling the data forwarding layer switch, and the backup controller is the controller in a standby state; 2) the controller supervises and detects the running state of each controller through polling scheduling or the controller actively reports to the controller for supervision, if a fault is found, the controller supervises and reports the fault to an application layer on one hand, network events such as controller fault events and controller state query are uploaded to the application layer on the other hand, flow rules are issued to each controller through an API (application programming interface) interface such as an REST-API (representational state transfer-application programming interface) interface on the other hand, if the backup controller is found to be faulty, the backup controller is maintained or restarted, and the controller supervises and reports the fault of the backup controller to the application layer; if the master controller is found to have a fault, the controller supervises that on one hand, the fault is reported to an application layer, on the other hand, a backup controller is reselected as the master controller, various configuration information of the original controller, such as topology information, switch information, log information, database connection information and the like, is copied, a new master controller is changed, a new master controller label is changed, the switch knows that the master controller is the master controller, and the recovery of SDN network service is realized; 3) receiving a command of an application layer, if the application layer requires to replace the main controller, if the backup controller operates normally, directly switching to the backup controller required by the application layer, otherwise, returning the fault information of the backup controller by the application layer; if the application layer requires to restart or maintain the backup controller, the backup controller is directly restarted or maintained; 4) and storing the fault information, the running state and the like of each controller into a database so as to analyze the fault by an application layer, thereby finding the fault in time, solving the fault and the like.
And the controller supervision and the application layer keep communication, if the application layer finds that the controller supervision fails, the controller supervision software is restarted, or the controller is restarted to supervise a server, and then the controller supervision is started.
In the operation process of the controller, the backup controller keeps data synchronization with the main controller, for example, the configuration information of the main controller, such as topology information, switch information, log information, database connection information and the like, is copied to the backup controller in time, so that when the main controller breaks down, the backup controller can switch in time and data service is not influenced.
The system comprises a control detection layer, a network state monitoring layer and a network state monitoring layer, wherein the control detection layer is used for managing switches in an SDN network, realizing link state detection, link fault discovery, link fault recovery, flow table caching, switch connection and the like of the switches and storing topology information, switch information, log information and the like; on the other hand, the controller monitoring layer communicates with the control monitoring layer through an API port and provides the running state of the controller for the controller monitoring layer; meanwhile, the control detection layer reports the network link state, the link recovery state and the like to the application service layer and the interface layer.
And the data forwarding layer is used for realizing management of the switches, the links and other equipment and communication with the controller, sending the broadcast packet to other switches, analyzing the flow table item of the received broadcast packet, analyzing the flow table item in the switch of the received broadcast packet to obtain the network disconnection condition, sending the disconnection condition to the control detection layer, and finding the link fault position by the controller detection layer.
The application service layer 1 is similar to an application layer in an existing SDN network architecture, and the implemented functions include: receiving topology information, alarm information, link state information and the like sent by the control detection layer 4, processing the information and sending the information to the interface monitoring layer 2 for displaying; receiving reported information of the control monitoring layer 3, such as controller state information, controller switching results, controller upgrading information and the like, and sending the processed information to the interface monitoring layer 2 for display; and issuing various configurations of the user to a controller for supervision or control.
The interface monitor layer 2 is used for displaying the controller running state, the control monitor running state, the link connection condition, the switch state, the link failure condition, the controller recovery condition, the link recovery condition and the like in the network, providing a user configuration window, sending configuration information to the application service layer 1 for processing through various interface messages such as REST-API after the configuration information is processed through the application layer, and then sending the processed messages to the control monitor layer 3 or the control detection layer 4 for processing through the application service layer 1 according to actual conditions.
The control supervision layer 3 is used for managing controllers in the SDN network, and comprises single-domain controllers and multi-domain controllers, wherein one of the controllers is selected as a main controller, faults of the main controller are found in a polling scheduling detection mode or a controller active reporting mode, the faults are reported to the application service layer 1, each controller is issued with a streaming rule through an API (application programming interface) interface, such as an REST-API (representational state transfer protocol) interface, network events, such as controller fault events, controller states and the like, are uploaded to the application service layer 1, when the controller encounters the faults, one controller is selected to operate as the main controller again, when the main controller is found to be faulty, the controller supervises to report the faults to the application service layer 1, and on the other hand, one backup controller is selected again as the main controller, and various configuration information of the original controller, such as topology information and switch information, And copying log information, database connection information and the like to a new main controller to realize automatic recovery of SDN network services.
The control detection layer 4 is used for managing switches in the SDN network, implementing link state detection, link fault discovery, link fault recovery, flow table caching, switch connection and the like for the switches, and storing topology information, switch information, log information and the like, on one hand, the control detection layer communicates with the switches through an OpenFlow protocol or other protocols, monitors network states, and receives network states reported by the switches, such as link fault states and network topology information; on the other hand, the controller communicates with the control monitoring layer through an API port to provide the running state of the controller for the control monitoring layer 3; meanwhile, the control detection layer 4 reports the network link state, the link recovery state and the like to the application service layer 1 and the interface monitor layer 2.
The data forwarding layer 5 is used for link failure discovery, switch data forwarding, link failure recovery, and the like, receives Packet messages sent by the control detection layer 4, such as Packet-out messages, to manage devices such as switches and links, controls communication of the detection layer 4, sends broadcast packets to other switches, analyzes flow table entries of the received broadcast packets, analyzes the flow table entries in the switches of the received broadcast packets, obtains a network disconnection condition, sends the disconnection condition to the control detection layer, and the controller detection layer discovers a link failure position.
The applications of the application service layer 1 may include, but are not limited to, the following types according to source and function: command line applications 11, network management applications 12, security applications 13 and other applications 14.
The command line application 11 is an application accessed by a controller manager, and implements operations such as configuration and query of the controller through a command line (non-open source) reserved by the controller, thereby implementing some functions of verification and debugging.
The network management application 1-2 is used for implementing various network configurations of the controller by a network administrator and checking network states, such as states of alarm, topology and the like.
The security application 13 refers to an interface that requires human intervention when a link and a controller in a network are restored, for example, when the link is restored, a must-pass node needs to be manually set, and the like, and may also refer to a security service cloud third-party mechanism to provide services and guarantees in terms of security for users.
Other applications 14 refer to various reserved processing applications such as controllers, controller supervisory software upgrades, startup logs, memory leak detection, controller supervisory reboots, maintenance and upgrades, controller upgrade maintenance, personnel designated master controllers, and the like.
The interface monitor layer 2 includes, but is not limited to, two parts, a user interface 21 and a message processing module 22. And is used for acquiring data from the message processing module, converting the data into a graphical interface, providing a configured window for network management personnel, and issuing REST-API or HTTP protocol to send the configuration to the message processing module 22.
The message processing module 2-2 receives the information from the feedback module 48, sends the response result to the user interface 21 in a REST-API or HTTP protocol manner, caches the instruction of the user interface 21, and sends the cached instruction to the feedback module 41.
The control supervisory layer 3 comprises 3 modules: a state detection module 3-1, a controller management module 3-2 and a message forwarding module 33.
And the state detection module 31 is used for detecting the operation state of the controller, finding the fault state of the controller in time and sending the fault state to the interface monitoring layer 2.
The controller management module 32 is configured to connect to multiple controllers controlled by the controller management module, for example, connect to the controllers through REST-API interfaces, communicate with the controllers through time-sharing polling scheduling query or active reporting of the controllers to obtain the connection status of each controller, and if a failure of the main controller is found, automatically select another controller as the main controller, or a user requests the main controller to switch, and sends a message to request a new main controller to copy configuration information, topology information, database connection information, and the like of the original controller to the new main controller. The controller management module 32 has the following capabilities: (1) the failure of the controller can be quickly responded, and after the controller fails, the switch quickly responds to the failure and failure events of the controller; (2) the operation of the existing service is not influenced in the switching process of the controller; (3) the service delay generated in the switching process of the network service is within the range which can be received by the user, and no obvious service interruption is generated.
The message forwarding module 33 sends messages to the application service layer 1, receives messages from the application layer, such as messages for the user to actively request to switch the main controller, and sends messages to the controller management module 32 for processing.
The control detection layer 4 comprises 10 modules: the system comprises a link discovery module 41, a link detection module 42, a link recovery module 43, a topology management module 44, a switch connection module 45, a data synchronization module 46, a forwarding rule management module 47, an information statistics module 48, a feedback module 49 and a data storage module 410.
The link discovery module 41 monitors the status of the links between the switches and updates the link information in real time. The method comprises the steps that a controller executes a Link Discovery process, Packet-out information carrying LLDP (Link Layer Discovery Protocol) information is sent to an Openflow switch connected with the controller, the controller commands the switch to send LLDP data packets to all ports, if equipment adjacent to the Openflow switch is also the Openflow switch, the switch sends Packet-in information to the controller, and after the controller collects a plurality of Link information in a self management area, the controller can construct a network topology structure according to the information. As shown in the SDN network link discovery process diagram of fig. 7, after this step of determination, the host controller can only know that the switch 2 and the switch 4 are not directly connected, and the switch 3 and the switch 4 are not directly connected, but does not know whether there is a disconnection or a non-OpenFlow switch between them.
Through the above steps, the main controller finds that the controlled switches are not intercommunicated pairwise, sends a command to all the switches controlled by the main controller, and requests the switches to send broadcast packet messages to other ports except the ports connected with the switches and the controller, if a non-Openflow switch exists in the network, the broadcast packet enters from one end of the network and passes through to reach other switches connected with the non-Openflow switch, and because the switch receiving the broadcast packet does not have a corresponding flow table item for matching the broadcast packet, the broadcast packet will upload an exception to the controller, thereby informing the controller that a non-Openflow domain exists in the network. And if the controller does not receive the broadcast packet uploaded by the OpenFlow switches, the corresponding two OpenFlow switches belong to a disconnection relation, so that the link failure is discovered. Further, the controller may know that the switch 3 and the switch 4 are in a disconnection relationship, and the non-OpenFlow domain exists before the switch 2 and the switch 4. Through the steps, the controller achieves the link discovery purpose.
And the fault detection module 42 is used for detecting the types and fault positions of the switch and the link, sending the detection result to the application service layer 1, and finally sending the detection result to the interface monitoring layer 2 for displaying. The controller detects various link failures in the link through different methods, such as switch port failures, flow table entry errors, link overloading, large packet losses due to link breaks, and the like. First, when a forwarding port of a switch fails, a state of the port changes, the switch sends the state change to a controller (e.g., a port-status message), and the controller can determine that a certain forwarding port fails according to the message. Secondly, each switch stores forwarding flow tables of all switches, and forwarding behaviors of all switches can be detected at the same time. The switch can judge whether a certain section of link is blocked or not by detecting and analyzing the issued flow table, and when a link fault caused by a logic error of the flow table occurs, the controller can quickly repair the error of the flow table and solve the fault. Thirdly, for the condition of link disconnection, the controller continuously detects the state of the switch and updates the network topology according to the packet-in message fed back by regularly sending the packet-out message carrying the LLDP information. And a global resource view of the whole network is formed in the controller so as to realize the centralized management and control function of the network.
The link recovery module 43, when a link finds a fault, may automatically select a fault recovery mode according to the type of the fault, generate an optimal recovery link, update the topology, update the flow table entry for the switch on the new optimal link, and invoke the forwarding rule management module 46 to issue the flow table entry on the new path. The controller divides the fault into a first priority recovery mode, a second priority recovery mode and a third priority recovery mode according to the fault type, the emergency and the fault grade, and realizes the optimized automatic recovery of the service in the SDN network.
In the first priority recovery mode, if the data stream affected by the fault has not high requirements on quality of service assurance and possibly has high requirements on fault recovery response time, the data stream is directly switched to the backup path for recovery, but the main path is generally the optimal path, and when the main path fails, the backup path is not necessarily the optimal path of the current environment. This selection is also the most common way for current SDN link recovery usage.
In the second priority recovery mode, if the data flow affected by the fault has high requirement on the service quality assurance but the recovery response time is not high, an optimal path can be recalculated and then the switch switches to a new optimal path.
In the third preferred recovery mode, if the data stream affected by the failure has a high requirement on the quality of service assurance and a high requirement on the recovery response time, for example, a real-time high-bandwidth network data stream service has a high requirement on delay and packet loss rate, the switch is directly switched to the backup path in a parallel execution mode, so that even if the service is switched, a new optimal path is calculated, and then the data stream is switched from the backup path to the new optimal path.
For the situation that the main path and the backup path have a lot of faults, only one path can be recalculated and issued to the affected switch, and the controller can calculate that a plurality of backup paths are issued to the switch for processing the situation that the faults are more, but a large amount of storage space of the switch is occupied and the load of the switch is increased, so that the situation that the second priority recovery mode is adopted in the situation is more.
The topology management module 44 establishes the topology of the switch based on the existing link information, and based on the above modules, the device control part maintains the topology, the flow table, the operation state and other information of the bottom layer switch network.
The switch connection module 45 is connected to the underlying switch by, but not limited to, an OpenFlow protocol, and communicates with the underlying switch, and manages an operation state of each switch by the OpenFlow protocol.
The data synchronization module 46 is configured to keep data synchronization with the primary controller during the operation of the controller, for example, to copy configuration information of the primary controller, such as topology information, switch information, log information, database connection information, etc., to the backup controller in time, so that when the primary controller fails, the backup controller can switch in time without affecting data services.
The forwarding rule management module 47 manages flow rules between the controller and the switch, for example, deleting the flow rules on the old link and issuing the flow rules on the new path.
And the information counting module 48 is configured to store the switch information collected by the OpenFlow protocol in the database storage module 410.
The feedback module 49, on the one hand, feeds back the controller status information, link discovery information, link recovery information, and real-time topology information to the network management personnel, and on the other hand, feeds back the controller supervision communication status to the network management personnel, and if a failure in the controller supervision is detected, sends an alarm to the user in time to prompt the controller supervision to be restarted or maintain the controller supervision.
The database storage module 410 stores the relevant link data, topology information, optimal path, backup path, address and other information of the controller and the switch.
The data forwarding layer 5 comprises 2 modules: a broadcast packet module 51 and a controller message forwarding module 52.
The broadcast packet module 51 implements management of the switches, links, and other devices, implements communication with the controller, and sends broadcast packets to other switches, analyzes the flow entries of the received broadcast packets, analyzes the flow entries in the switches of the received broadcast packets, obtains a network disconnection condition, sends the disconnection condition to the control detection layer, and the controller detection layer finds a link failure location.
The controller message forwarding module 52 is configured to implement message forwarding with the controller, for example, send a Packet-in message to the controller, feed back a state of the switch, update a topology to the controller, receive a Packet-out message sent by the controller, and send a link discovery protocol sent by the controller to each port of the switch.
Fig. 3 is a general flow chart of SDN network fault discovery, which is used to show a discovery process of a controller fault and a link fault in an SDN network according to steps, and includes the following steps:
and 3-1, acquiring the fault type, and analyzing whether the fault is monitored by the controller, the controller or the link fault according to the fault type determined by the controller monitoring layer, the controller or the application service layer in the figure 2.
And if the controller supervises the fault, the step 3-2 of supervising the fault by the controller is carried out.
And step 3-2, the controller supervises the fault, the fault probability is minimum, and if the application layer finds that the module has the fault, the step 3-3 is carried out to maintain or restart the controller supervision.
Step 3-3 maintains or restarts the controller supervision, repairs the controller supervision through the application service layer in fig. 2, or restarts the controller supervision.
And if the controller fails, the step 3-4 is carried out, and the controller fails.
And 3-4, if the controller is in fault, the controller monitors and finds that the main controller is in fault, if the main controller is hung or the message receiving and sending of the main controller are too slow and exceed the specified time, the step 3-5 is carried out.
And 3-5, acquiring the information of the main controller with the fault, and monitoring and obtaining the IP of the main controller, the position information of the controller and the like by the controller so as to inform the backup controller of preparation before recovery.
And 3-6, reporting the information of the failed main controller to an application layer, reporting the IP (Internet protocol) of the failed controller, the address of the controller, the type of the failure and the like to the application layer, and displaying the information through an interface supervision layer in the figure 2.
And 3-7, starting a controller fault recovery mode, namely entering the controller fault recovery.
If the link is in failure, the step 3-8 is carried out to avoid the link failure.
And 3-8, judging the type of the link fault through steps 3-9, wherein the link fault can cause that packet data on the network cannot be transmitted or a large amount of packets are lost when the SDN network has the link fault, and the network transmission rate is reduced.
And 3-9, judging the type of the link fault according to the reported link error or other methods, wherein the type of the link fault is port fault, link blockage or link disconnection, and respectively switching to 3-10 port fault, 3-11 link congestion and 3-12 link disconnection.
And 3-10, when the forwarding port of the switch fails, the state of the port changes, the switch sends the state change to a controller (such as a port-status message), and the controller can judge that a certain forwarding port fails according to the message.
And 3-11, the link is blocked, each switch stores the forwarding flow tables of all switches, and the forwarding behaviors of all switches can be detected. The switch can judge whether a certain section of link is blocked or not by detecting and analyzing the issued flow table, and when a link fault caused by a logic error of the flow table occurs, the controller can quickly repair the error of the flow table and solve the fault.
And 3-12, the link is broken, and for the condition of the broken link, the controller continuously detects the state of the switch and updates the network topology according to the packet-in message fed back by regularly sending a packet-out message carrying LLDP information. And a global resource view of the whole network is formed in the controller so as to realize the centralized management and control function of the network.
And 3-13, storing various link faults, storing various link fault sets in a memory or a database through ID association, and clearing the faults from the memory or the database if the faults are solved.
And 3-14, traversing the fault set, reading the list items of the switch unit and the ID thereof, and taking the value of the list item with the fault.
Step 3-15 takes out the flow table entry whose output action is group table ID, and finds out the flow table entry whose group table ID is the same as the faulty group table ID in step 3-14.
And 3-16, judging which type of data flow is, and judging which type of data flow is according to the content of the flow table entry so as to select the first, second or third priority recovery fault.
And 3-17, storing the information of the fault flow table according to the fault mode, and storing the information of the fault flow table into a memory according to the fault mode to perform link recovery by the table.
Steps 3-18 initiate Link failure recovery, embodiments of which refer to FIG. 6
Fig. 4 is a schematic structural diagram of a fault recovery system of an SDN network controller, as can be seen from fig. 4, the controller supervises and manages a controller 1 (and a main controller) and a backup controller, there may be a plurality of backup controllers, the controller 1 and the backup controller are physically connected to switches 1 to 6, but only the controller 1 communicates with each switch, and meanwhile, the control supervisor communicates with the controller 1, such as websocket communications, and if the controller 1 fails, the controller is switched according to rules of the fault recovery flow chart of the SDN network controller in fig. 5.
Figure 5 is an SDN network controller failure recovery flow diagram.
And 5-1, finding out the controller fault, monitoring and finding out the controller fault by the controller, leading to the connection failure with the controller, and turning to 5-2 to determine whether the controller has the fault.
And 5-2, judging whether the controller fails, judging whether the failed controller is the main controller, if so, turning to the step 5-3 to start the switch, and otherwise, turning to the step 5-7 to restart the backup controller.
And 5-3, selecting the controller as a main controller, if the current main controller is found to be in fault, supervising and selecting a normally-operated controller from a plurality of controllers connected with the controller by the controller to be used as the main controller, and setting a main controller label.
And 5-4, restoring the configuration information, and copying the configuration information, the topology information, the database connection information and the like of the original main controller server to the new main controller.
And 5-5, reporting the event to the selected controller, switching all the switch events connected with the original main controller to report to the new main controller, and realizing the control of the new main controller on the switches.
And 5-6, synchronously updating the controllers, keeping the state synchronization of the new main controller based on the reported events, and keeping the supervision communication of the new main controller and the controllers.
Fig. 6 is a flowchart of SDN link failure recovery according to an embodiment of the present invention.
And 6-1, acquiring a source-destination switch connected with the host, and calling a host tracking module to acquire the source switch and the destination switch connected with the host according to the source host IP address and the destination host IP address of the flow table entry.
And 6-2, acquiring the flow rule of the fault path, and acquiring the flow rule on the old path influenced by the fault by matching the IP addresses of the source host and the destination host with flow table entries in an SDN controller database.
And 6-3, deleting the flow rules on the old path, and calling a controller forwarding rule management module 4-7 (as shown in the structural diagram of the SDN network controller fault recovery system in fig. 4) to delete the flow rules on the old path.
And 6-4, confirming the fault recovery mode through fault diagnosis, and confirming the fault recovery mode through the classification and comparison of the link faults.
And 6-5, judging the fault recovery mode, if the fault recovery mode is the first fault recovery mode, turning to the step 6-6 to obtain the backup path, if the fault recovery mode is the second fault recovery mode, turning to the step 6-9 to recalculate an optimal path, and if the fault recovery mode is the third fault recovery mode, turning to the step 6-12.
And 6-6, if the data flow affected by the fault has low requirement on the service quality assurance and possibly has high requirement on the fault recovery response time, directly switching to a backup path for recovery, and acquiring the backup path of the route calculation path first.
And 6-7, automatically switching to the backup path through the rapid network fault group table.
And 6-8, recalculating an optimal path, wherein if the data flow affected by the fault has high requirements on service quality assurance but not high recovery response time, recalculating the optimal path.
And 6-9, automatically switching to the optimal path through the rapid network fault group table.
And 6-10, acquiring a backup path, and if the data stream affected by the fault has high requirements on service quality assurance and recovery response time, such as time delay and packet loss rate of a real-time high-bandwidth network data stream service, acquiring the backup path in advance by adopting a parallel execution mode.
And 6-11, automatically switching to the backup path through the rapid network fault group table.
And 6-12, recalculating an optimal path, and recalculating the optimal path according to a route calculation algorithm.
And 6-13, automatically switching to a new optimal path through the fast network fault group table.
And 6-14, issuing the flow table item of the new path, and calling a forwarding rule management module 4-7 to issue the flow table item on the new path.
And 6-15, completing link failure recovery and reporting a recovery result to the controller.
Based on the embodiments of the system, the invention further provides a service recovery method of the SDN network and a switch link fault detection method.
In an embodiment of the present invention, a method for recovering a service of an SDN network includes:
monitoring the running states of all controllers of the SDN network and the connection states of all switch links; the all controllers comprise a main controller and a plurality of backup controllers;
maintaining data synchronization between the primary controller and each backup controller;
if the main controller fails, selecting a backup controller as a new main controller, and configuring the new main controller according to various configuration information of the main controller;
and if one switch link fails, performing switch link failure recovery according to the failure attribute.
The method in the embodiment of the invention effectively ensures the reliability of the SDN and realizes the automatic recovery of the SDN.
Meanwhile, the embodiment of the invention keeps the backup controller to be synchronous with the data of the main controller at all times, so that if the main controller and the backup controller are switched, the service can be recovered in the fastest time.
In an embodiment of the present invention, before monitoring the operation states of all controllers of the SDN network and the connection states of all switch links, the method further includes:
the method comprises the steps that a plurality of controllers are deployed in an SDN network in advance and are respectively connected with all switches in the same area;
one of the controllers is selected as the primary controller and the remaining controllers are selected as the plurality of backup controllers.
In another embodiment of the present invention, if the master controller fails, the method includes:
and judging that the main controller fails in a polling detection mode, an active reporting mode of the controller or a socket communication mode.
In another embodiment of the present invention, after monitoring the operation states of all controllers of the SDN network and the connection states of all switch links, the method further includes:
upgrading one or more backup controllers;
and selecting one upgraded backup controller as a new main controller, and configuring the new main controller according to various configuration information of the main controller.
In another embodiment of the present invention, if a switch link fails, performing a switch link failure recovery according to a failure attribute includes:
if a switch link fails, judging the failure attribute;
and recovering the switch link fault according to a preset priority recovery mode corresponding to the fault attribute.
Further, the fault attribute is judged; and performing the switch link fault recovery according to the preset priority recovery mode corresponding to the fault attribute, including:
judging a first set standard corresponding to the service quality guarantee requirement of the service flow affected by the fault, and judging a second set standard corresponding to the fault recovery response time requirement of the service flow;
when the first set criterion belongs to a preset low quality threshold range and the second set criterion belongs to a preset high response threshold range, switching the failed switch link to a backup switch link;
when the first set standard belongs to a preset high-quality threshold range and the second set standard belongs to a preset low-response threshold range, calculating an optimal switch link and switching the failed switch link to the optimal switch link;
and when the first setting standard belongs to a preset high-quality threshold range and the second setting standard belongs to a preset high-response threshold range, firstly switching the failed switch link to the backup switch link, calculating an optimal switch link, and then switching the switched backup switch link to the optimal switch link.
Specifically, before the step of determining the failure attribute if a switch link fails, the method further includes:
sending a topological structure instruction to the main controller so that the main controller constructs topological structures of all the switches;
and acquiring the topological structure from the main controller, and detecting the switch link fault according to the topological structure.
The invention provides a method for detecting a link fault of a switch in an SDN network, which comprises the following steps:
the main controller constructs the topological structures of all the switches;
and the main controller detects the switch link fault in a preset detection mode according to the topological structure if finding that any two switches are not in an intercommunication state.
Wherein, the master controller constructs the topological structure of all the switches, including:
the master controller sends first messages carrying link discovery protocols to all the switches so that all the switches respond to second messages;
and constructing the topology of all the switches according to each second message of the response.
Further, the detecting the switch link failure by the preset detecting method includes:
the master controller acquires port state information of each switch and judges port faults of each switch according to the port state information; and/or the presence of a gas in the gas,
the main controller acquires forwarding flow table information of each switch, and judges each link blockage fault according to the forwarding flow table information; and/or the presence of a gas in the gas,
the main controller sends a grouping message instruction to all the switches so that each switch sends a broadcast grouping message; when receiving a packet message responded by two switches which are not in an intercommunication state, judging that the two switches are non-OpenFlow switches and do not belong to a link open circuit fault; when the packet message responded by the two switches is not received, the link between the two switches is determined to be the link disconnection fault.
While this application describes specific examples of the invention, those skilled in the art may devise variations of this invention without departing from the inventive concept.
In light of the above teachings, those skilled in the art can make various modifications to the method of the present invention without departing from the scope of the present invention.

Claims (11)

1. A service restoration apparatus for an SDN network, the apparatus comprising:
the state detection module is used for monitoring the running states of all controllers of the SDN network and the connection states of all switch links; the all controllers comprise a main controller and a plurality of backup controllers;
the controller management module is used for maintaining data synchronization between the main controller and each backup controller; if the main controller fails, selecting a backup controller as a new main controller, and configuring the new main controller according to various configuration information of the main controller;
the link management module is used for recovering the link failure of the switch according to the failure attribute if the link failure of the switch occurs;
the link management module includes: the fault detection module is used for judging the fault attribute if a switch link fails; the link recovery module is used for recovering the switch link failure according to a preset priority recovery mode corresponding to the failure attribute;
wherein, if a switch link fails, determining the failure attribute comprises:
the fault detection module is specifically configured to determine a first set criterion corresponding to a service quality assurance requirement of a service flow affected by the fault, and determine a second set criterion corresponding to a fault recovery response time requirement of the service flow;
wherein, the performing the switch link failure recovery according to the preset priority recovery mode corresponding to the failure attribute comprises:
the link recovery module is specifically configured to switch the failed switch link to the backup switch link when the first setting criterion belongs to a preset low quality threshold range and the second setting criterion belongs to a preset high response threshold range;
when the first set standard belongs to a preset high-quality threshold range and the second set standard belongs to a preset low-response threshold range, calculating an optimal switch link and switching the failed switch link to the optimal switch link;
and when the first setting standard belongs to a preset high-quality threshold range and the second setting standard belongs to a preset high-response threshold range, firstly switching the failed switch link to the backup switch link, calculating an optimal switch link, and then switching the switched backup switch link to the optimal switch link.
2. The apparatus of claim 1, wherein the apparatus further comprises:
the system deployment module is used for deploying a plurality of controllers in the SDN network in advance and connecting the controllers with all switches in the same area respectively;
one of the controllers is selected as the primary controller and the remaining controllers are selected as the plurality of backup controllers.
3. The apparatus of claim 1, wherein the controller management module is further configured to determine that the master controller fails by polling detection, controller active reporting, or socket communication.
4. The apparatus of any of claims 1-3, wherein the controller management module is further to upgrade one or more backup controllers;
and selecting one upgraded backup controller as a new main controller, and configuring the new main controller according to various configuration information of the original main controller.
5. The apparatus of claim 1, wherein the link management module further comprises:
a link discovery module, configured to send a topology structure instruction to the master controller, so that the master controller constructs topology structures of all switches;
the fault detection module is specifically configured to obtain the topology structure from the master controller, and perform switch link fault detection according to the topology structure.
6. A service recovery system for an SDN network, the system comprising an apparatus according to any of claims 1-5.
7. A service recovery method for an SDN network, the method comprising:
monitoring the running states of all controllers of the SDN network and the connection states of all switch links; the all controllers comprise a main controller and a plurality of backup controllers;
maintaining data synchronization between the primary controller and each backup controller; if the main controller fails, selecting a backup controller as a new main controller, and configuring the new main controller according to various configuration information of the main controller;
if a switch link fails, the switch link failure recovery is carried out according to the failure attribute, comprising the following steps: if a switch link fails, judging the failure attribute; performing switch link fault recovery according to a preset priority recovery mode corresponding to the fault attribute;
judging the fault attribute; and performing the switch link fault recovery according to the preset priority recovery mode corresponding to the fault attribute, including:
judging a first set standard corresponding to the service quality guarantee requirement of the service flow affected by the fault, and judging a second set standard corresponding to the fault recovery response time requirement of the service flow;
when the first set criterion belongs to a preset low quality threshold range and the second set criterion belongs to a preset high response threshold range, switching the failed switch link to a backup switch link;
when the first set standard belongs to a preset high-quality threshold range and the second set standard belongs to a preset low-response threshold range, calculating an optimal switch link and switching the failed switch link to the optimal switch link;
and when the first setting standard belongs to a preset high-quality threshold range and the second setting standard belongs to a preset high-response threshold range, firstly switching the failed switch link to the backup switch link, calculating an optimal switch link, and then switching the switched backup switch link to the optimal switch link.
8. The method of claim 7, wherein prior to monitoring the operational state of all controllers and the connection state of all switch links of the SDN network, further comprising:
the method comprises the steps that a plurality of controllers are deployed in an SDN network in advance and are respectively connected with all switches in the same area;
one of the controllers is selected as the primary controller and the remaining controllers are selected as the plurality of backup controllers.
9. The method of claim 7, wherein if the master controller fails, comprising:
and judging that the main controller fails in a polling detection mode, an active reporting mode of the controller or a socket communication mode.
10. The method of any one of claims 7-9, wherein after monitoring the operational state of all controllers and the connection state of all switch links of the SDN network, further comprising:
upgrading one or more backup controllers;
and selecting one upgraded backup controller as a new main controller, and configuring the new main controller according to various configuration information of the original main controller.
11. The method of claim 7, wherein said step of determining the failure attribute is preceded by the step of determining the failure attribute if a switch link fails, further comprising:
sending a topological structure instruction to the main controller so that the main controller constructs topological structures of all the switches;
and acquiring the topological structure from the main controller, and detecting the switch link fault according to the topological structure.
CN201611252266.0A 2016-12-30 2016-12-30 Service recovery device, main controller, system and method of SDN network Active CN108270669B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611252266.0A CN108270669B (en) 2016-12-30 2016-12-30 Service recovery device, main controller, system and method of SDN network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611252266.0A CN108270669B (en) 2016-12-30 2016-12-30 Service recovery device, main controller, system and method of SDN network

Publications (2)

Publication Number Publication Date
CN108270669A CN108270669A (en) 2018-07-10
CN108270669B true CN108270669B (en) 2022-08-02

Family

ID=62753996

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611252266.0A Active CN108270669B (en) 2016-12-30 2016-12-30 Service recovery device, main controller, system and method of SDN network

Country Status (1)

Country Link
CN (1) CN108270669B (en)

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109309617A (en) * 2018-08-08 2019-02-05 华为技术有限公司 Disaster tolerance switching method, relevant device and computer storage medium
CN109240608B (en) * 2018-08-22 2021-08-31 郑州云海信息技术有限公司 Configuration information synchronization method and device
CN109361545A (en) * 2018-11-01 2019-02-19 郑州云海信息技术有限公司 A kind of method and device of software defined network SDN controller control link switching
CN110113258B (en) * 2019-04-23 2024-03-26 北京全路通信信号研究设计院集团有限公司 Method and system for automatically protecting data surface link by using control surface link
CN110086666B (en) * 2019-04-25 2022-04-26 深圳前海微众银行股份有限公司 Alarm method, device and system
CN110505086B (en) * 2019-08-16 2023-01-06 苏州浪潮智能科技有限公司 Fault-tolerant method and device for distributed controller
CN110716471A (en) * 2019-10-29 2020-01-21 中车株洲电力机车有限公司 Dual-CPU hot standby redundancy control method and device for brake control unit of brake
CN111030851B (en) * 2019-11-29 2022-12-27 苏州浪潮智能科技有限公司 Management method, equipment and readable medium for network diagnosis recovery
CN113890850B (en) * 2020-07-01 2023-06-06 阿里巴巴集团控股有限公司 Route disaster recovery system and method
CN112187533B (en) * 2020-09-18 2023-04-18 北京浪潮数据技术有限公司 Virtual network equipment defense method, device, electronic equipment and medium
CN112199241B (en) * 2020-09-28 2023-06-06 西南电子技术研究所(中国电子科技集团公司第十研究所) Double-network-port multi-board-card network hot backup device
CN112260971B (en) * 2020-10-21 2021-11-16 湖南大学 Fault tolerance method and device for network equipment system, computer equipment and storage medium
CN112564964B (en) * 2020-12-04 2022-06-24 中国石油大学(华东) Fault link detection and recovery method based on software defined network
CN112800064B (en) * 2021-02-05 2023-06-02 成都延华西部健康医疗信息产业研究院有限公司 Real-time big data application development method and system based on Confluent community open source version
CN113010349A (en) * 2021-02-23 2021-06-22 上海中船船舶设计技术国家工程研究中心有限公司 Soft reset method and system for Ethernet switch
CN113236329B (en) * 2021-05-20 2024-02-20 三一智矿科技有限公司 Electrohydraulic bracket controller and fault recovery method thereof
CN113472644B (en) * 2021-07-12 2023-03-31 武汉绿色网络信息服务有限责任公司 Path addressing method and network service system
CN113612691B (en) * 2021-08-06 2023-04-07 浙江工商大学 Path conversion method, storage medium and terminal equipment
CN114484766B (en) * 2021-12-21 2023-04-07 珠海格力电器股份有限公司 Method for determining master controller and related equipment
CN114222322A (en) * 2021-12-31 2022-03-22 展讯通信(上海)有限公司 Network communication method, device, equipment and storage medium
CN114721321B (en) * 2022-03-01 2023-04-07 大连理工大学 Equipment automatic management method and system based on intelligent industrial switch
CN115348153B (en) * 2022-08-15 2023-07-18 中国联合网络通信集团有限公司 Control method, device, equipment and storage medium of forwarding equipment
CN115361315B (en) * 2022-08-25 2024-04-26 超越科技股份有限公司 Openflow switch reliability test method and storage medium
CN115622907A (en) * 2022-09-07 2023-01-17 国网青海省电力公司信息通信公司 Line detection method and device, nonvolatile storage medium and computer equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104468236A (en) * 2014-12-19 2015-03-25 上海斐讯数据通信技术有限公司 SDN controller cluster, SDN switch and SDN switch connecting control method
CN104811325A (en) * 2014-01-24 2015-07-29 华为技术有限公司 Cluster node controller monitoring method, related device and controller
CN105357046A (en) * 2015-11-23 2016-02-24 北京邮电大学 Network information detection method for software defined networking (SDN)
CN105978741A (en) * 2016-07-15 2016-09-28 清华大学深圳研究生院 Network fault handling method and system

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9253026B2 (en) * 2013-12-18 2016-02-02 International Business Machines Corporation Software-defined networking disaster recovery
WO2015162619A1 (en) * 2014-04-25 2015-10-29 Hewlett-Packard Development Company, L.P. Managing link failures in software defined networks
US10356011B2 (en) * 2014-05-12 2019-07-16 Futurewei Technologies, Inc. Partial software defined network switch replacement in IP networks
CN105933253B (en) * 2016-04-13 2018-09-04 浪潮集团有限公司 Interchanger configuration recovery method under a kind of SDN network
CN106130925A (en) * 2016-08-26 2016-11-16 广州西麦科技股份有限公司 Link scheduling method, equipment and the system of a kind of SDN
CN106130767B (en) * 2016-09-23 2020-04-07 深圳灵动智网科技有限公司 System and method for monitoring and solving service path fault

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104811325A (en) * 2014-01-24 2015-07-29 华为技术有限公司 Cluster node controller monitoring method, related device and controller
CN104468236A (en) * 2014-12-19 2015-03-25 上海斐讯数据通信技术有限公司 SDN controller cluster, SDN switch and SDN switch connecting control method
CN105357046A (en) * 2015-11-23 2016-02-24 北京邮电大学 Network information detection method for software defined networking (SDN)
CN105978741A (en) * 2016-07-15 2016-09-28 清华大学深圳研究生院 Network fault handling method and system

Also Published As

Publication number Publication date
CN108270669A (en) 2018-07-10

Similar Documents

Publication Publication Date Title
CN108270669B (en) Service recovery device, main controller, system and method of SDN network
US11153148B2 (en) Resource state monitoring method, device and communication network
US8441941B2 (en) Automating identification and isolation of loop-free protocol network problems
WO2020020144A1 (en) Link switching method, link switching device, network communication system and computer readable storage medium
US8775589B2 (en) Distributed network management system and method
CN109525445B (en) Link switching method, link redundancy backup network and computer readable storage medium
US20150249572A1 (en) Software-Defined Network Control Using Functional Objects
US9385944B2 (en) Communication system, path switching method and communication device
WO2016165463A1 (en) Software defined network-based link failure reporting method and forwarding device
CN106936613B (en) Method and system for rapidly switching main and standby Openflow switch
CN101094186A (en) Method and interface board of retaining neighbourhood
EP2509273A1 (en) Method and system for updating network topology in multi-protocol label switching system
CN102263651A (en) Method for detecting connection state of local end equipment in SNMP (simple network management protocol) network management system (NMS)
JP2019503632A (en) Optical communication system with distributed wet plant manager
CN101340377B (en) Method, apparatus and system for data transmission in double layer network
CN102571383B (en) Access control method and system
WO2023173755A1 (en) Olt network element monitoring device and monitoring system, and management method
JP2006148376A (en) Network monitoring system, network superordinate monitoring system, network subordinate monitoring system, and network monitoring method
CN115664969A (en) SD-WAN system, and use method and device of SD-WAN system
KR101586950B1 (en) Method for controlling convergence time of network equipment
JP2015173378A (en) Management system, management server and management method
WO2016082368A1 (en) Data consistency maintaining method, device and ptn transmission apparatus
CN107248935B (en) System and method for network management to discover and monitor network elements
KR20150059697A (en) Method and System for detecting network failure in Software Defined Network
CN111526059B (en) Networking method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant