WO2015117395A1 - 通信异常处理方法及网元设备 - Google Patents

通信异常处理方法及网元设备 Download PDF

Info

Publication number
WO2015117395A1
WO2015117395A1 PCT/CN2014/089212 CN2014089212W WO2015117395A1 WO 2015117395 A1 WO2015117395 A1 WO 2015117395A1 CN 2014089212 W CN2014089212 W CN 2014089212W WO 2015117395 A1 WO2015117395 A1 WO 2015117395A1
Authority
WO
WIPO (PCT)
Prior art keywords
communication
abnormality
main control
control board
line card
Prior art date
Application number
PCT/CN2014/089212
Other languages
English (en)
French (fr)
Inventor
赵东林
吴展
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2015117395A1 publication Critical patent/WO2015117395A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0681Configuration of triggering conditions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning

Definitions

  • the present invention relates to the field of communication exception processing in distributed communication technologies, and in particular, to a communication exception processing method and a network element device.
  • a network element device is a combination of one or more chassis or chassis that can perform certain transmission functions independently.
  • the NE is composed of a main control board and a line card, and the main control board is responsible for managing the line card.
  • the main control board is the board responsible for control, management, and configuration. Some NEs only have one main control board. Some NEs have two main control boards, one is the main control board, one is the standby main control board, and the standby main control board is used as the backup of the main control board.
  • a line card is a board that is responsible for the exchange, forwarding, and processing of packets in the system.
  • the communication between the main control board or the line card is abnormal, which causes the communication between the line card and the main control board to be interrupted. If the communication is not resumed in time, the main control board cannot be further caused. A series of related faults, such as data, cannot be synchronized between the main control board and other boards.
  • the main purpose of the present invention is to solve the technical problem that the main control board and the line card communication in the network element device cannot be processed in time.
  • a communication exception processing method is applied to a network element device of a distributed system, and the method includes the following steps:
  • the communication abnormality corresponding to the abnormality count is processed.
  • the step of performing the accumulation processing on the abnormality type abnormality count includes:
  • the first abnormality count of the main control board and the line card is abnormally accumulated
  • the second abnormality of the communication abnormality occurs between the line card and the main control board or other line cards Performing an accumulation process
  • the island abnormality count of the communication abnormality of the main control board and all the line cards is subjected to an accumulation process.
  • the step of processing the communication abnormality corresponding to the abnormality count includes:
  • the main control board is actively restarted.
  • the first time period is greater than the second time period, and the second time period is greater than the third time period.
  • the communication exception processing method further includes:
  • a network element device includes a detection module, an accumulation module, and a processing module, where:
  • the detecting module is configured to: detect a communication status between a line card in the network element device and a main control board and other communication nodes in the network element device;
  • the accumulating module is configured to: when a communication abnormality is detected within a predetermined time period, acquire an abnormal type of the communication abnormality, and perform an accumulation process on the abnormality type abnormality count;
  • the processing module is configured to: when the abnormality count reaches a preset threshold, process a communication abnormality corresponding to the abnormality count.
  • the accumulating module comprises a first accumulating unit, a second accumulating unit and a third accumulating unit, wherein:
  • the first accumulating unit is configured to: when the communication between the main control board and the line card is abnormal during the predetermined first time period, the first abnormality of the communication abnormality occurs between the main control board and the line card Counting for accumulation processing;
  • the second accumulating unit is configured to: when the line card detects abnormal communication with the main control board or other line cards within a predetermined second time period, the line card and the main control board or other lines The second abnormal count of the communication abnormality of the card is subjected to an accumulation process;
  • the third accumulating unit is configured to: when an abnormality in communication between the main control board and all the line cards is detected within a predetermined third time period, an island abnormality in which the main control board and the line card are abnormal in communication The count is accumulated.
  • the processing module includes a first processing unit, a second processing unit, and a third processing unit, where:
  • the first processing unit is configured to: when the first abnormality count reaches a preset first threshold, send the main control board to the line card to send a restart command;
  • the second processing unit is configured to: when the second abnormality count reaches a preset second threshold, record information of the line card communication abnormality, upload the information to the main control board, and the line The card is actively restarted;
  • the third processing unit is configured to: when the island abnormality count reaches a preset island threshold, the main control board is actively restarted.
  • the first time period is greater than the second time period, and the second time period is greater than the third time period.
  • the network element device further includes a clearing module, where:
  • the clearing module is configured to: when the communication is normal, clear the abnormality count.
  • a computer program comprising program instructions that, when executed by a network element device, cause the network element device to perform the communication exception handling method as claimed above.
  • the communication abnormality processing method and the network element device of the foregoing technical solution can detect the line card and the main control board in the network element device of the distributed system and other communication nodes in the network element device, when detecting a communication abnormality
  • the abnormality count of the corresponding exception type is accumulated.
  • the line card can be restarted or the main control board can be restarted.
  • the abnormal condition is processed to restore the normal communication in time to ensure the normal transmission of data between the line card and the line card, between the line card and the main control board or between the main control board and the line card, and improve the fault tolerance and reliability of the network element device. Sex and availability.
  • FIG. 1 is a schematic flowchart of a communication exception processing method according to a first embodiment of the present invention
  • FIG. 2 is a schematic flowchart of a communication exception processing method according to a second embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a functional module of a network element device according to a first embodiment of the present invention
  • FIG. 4 is a schematic diagram of a function module of a network element device according to a second embodiment of the present invention.
  • the present invention provides a communication exception processing method.
  • the communication exception processing method includes:
  • Step S101 detecting, by the line card in the network element device, a communication status between the main control board and other communication nodes in the network element device;
  • the line card in the network element device can communicate with other line cards or the main control board, and the line card mainly completes the conversion function; and the main control board can communicate with the line card to implement control and management of the line card or Configuration and other functions.
  • the main control board of this embodiment is a main control board.
  • the online card when the online card is physically in place, it may be able to perform normal communication with other communication nodes, or may not perform normal communication.
  • the present embodiment periodically performs detection;
  • this embodiment also performs detection in real time.
  • Step S102 when a communication abnormality is detected within a predetermined time period, acquiring an abnormal type of the communication abnormality, and performing an accumulation process on the abnormality type abnormality count;
  • the abnormal type includes communication abnormality between the line card and other communication nodes (including other line cards and the main control board), abnormal communication between the main control board and one line card, and the main control board and all line cards.
  • communication anomalies There are three kinds of communication anomalies. Among them, the abnormal communication between the main control board and all line cards is also called an island anomaly.
  • different time periods are set according to different abnormal types for detecting, for example, a preset time period, in which the communication between the line card and other communication nodes is detected, if the time period is detected.
  • the communication between the line card and the other communication node is abnormal, and the abnormality count of the abnormality is accumulated, that is, the number of communication abnormalities of the abnormal type is calculated, for example, the line card communicates with another line card abnormally, before If the line card has an abnormal count of 3, the line card's abnormal count is incremented by one.
  • Step S103 When the abnormality count reaches a preset threshold, the communication abnormality corresponding to the abnormality count is processed.
  • the communication abnormality occurs in the communication between the line card and the other communication nodes, the communication abnormality between the main control board and one line card, and the three abnormal types in which the main control board and the line card are abnormal in communication are respectively pre-predicted.
  • the threshold of the abnormality count is preset according to the actual situation. For example, if the normal communication is not affected or the data is guaranteed to be transmitted normally, the threshold may be set larger.
  • the processing may be performed by restarting the line card or restarting the main control board, and normal communication is resumed at the same time to ensure the line card and Normal transmission of data between line cards, line cards and main control boards or between main control boards and line cards.
  • the manner of processing the communication abnormality can be customized through the command line.
  • the user can set the line card or the restart time of the main control board through the command line, and set the time when the network element device is turned on or off.
  • the embodiment can detect the line card and the main control board in the network element device of the distributed system and other communication nodes in the network element device, and when detecting a communication abnormality, corresponding to the corresponding The exception type of the abnormality count is accumulated.
  • the abnormality count reaches the preset threshold, it is considered that the communication abnormality affects the normal communication.
  • the abnormality can be handled by restarting the line card or restarting the main control board.
  • the situation is to restore the normal communication in time, to ensure the normal transmission of data between the line card and the line card, between the line card and the main control board or between the main control board and the line card, and improve the fault tolerance, reliability and availability of the network element equipment. .
  • the above step S102 includes the following sub-steps:
  • the first abnormality count of the main control board and the line card is abnormally accumulated
  • the second abnormality of the communication abnormality occurs between the line card and the main control board or other line cards Performing an accumulation process
  • the island abnormality count of the communication abnormality of the main control board and all the line cards is subjected to an accumulation process.
  • the time period for detecting whether the main control board and the line card are abnormal in communication is the first time period; the time period for detecting whether the line card is abnormal with the main control board or other line cards is the second time period; detecting the main control board
  • the time period during which communication abnormality occurs with all line cards is the third time period.
  • the probability that the line card has a communication abnormality is higher than the probability that the main board has an abnormality. Therefore, the first time period is set to be greater than the second time period, and the probability of an island abnormality occurring on the main control board is the smallest, so , setting the second time period to be greater than the third time period.
  • the above step S103 includes the following sub-steps:
  • the main control board is actively restarted.
  • the main control board sends a restart command to the line card to restart the line card, and records the reason for the restart;
  • the abnormal count reaches the preset second threshold, it is also considered to affect the normal communication.
  • the information of the line card communication abnormality is recorded, and the information is uploaded to the main control board, and the line card is actively restarted; when the island is abnormally counted
  • the main control board actively restarts, and the reason for the restart is recorded.
  • the first abnormality count is greater than the second abnormality count
  • the second abnormality count is greater than the island abnormality count
  • the embodiment includes after step S101:
  • Step S104 it is determined whether a communication abnormality is detected within a predetermined period of time, and if so, proceeds to step S102, otherwise proceeds to step S105;
  • Step S102 Acquire an abnormal type of the communication abnormality, and perform an accumulation process on the abnormality count of the abnormal type
  • Step S103 when the abnormality count reaches a preset threshold, processing the communication abnormality corresponding to the abnormality count
  • Step S105 the abnormality count is not processed
  • step S106 when the communication is normal, the abnormality count is cleared.
  • step S101, step S102, and step S103 are the same as the embodiment of FIG.
  • the abnormality count is not processed.
  • the abnormality count is cleared at this time, and clearing the abnormality count includes: clearing the first abnormal count, the second abnormal count, and the island abnormality count.
  • the network element device includes:
  • the detecting module 101 is configured to: detect, by the line card in the network element device, a communication status between the main control board and other communication nodes in the network element device;
  • the line card in the network element device can communicate with other line cards or the main control board, and the line card mainly completes the conversion function; and the main control board can communicate with the line card to implement control and management of the line card. Functions such as management or configuration.
  • the main control board of this embodiment is a main control board.
  • the online card when the online card is physically in place, it may be able to perform normal communication with other communication nodes, or may not perform normal communication.
  • the present embodiment periodically performs detection;
  • this embodiment also performs detection in real time.
  • the accumulating module 102 is configured to: when a communication abnormality is detected within a predetermined time period, acquire an abnormal type of the communication abnormality, and perform an accumulation process on the abnormality type abnormality count;
  • the abnormal type includes communication abnormality between the line card and other communication nodes (including other line cards and the main control board), abnormal communication between the main control board and one line card, and the main control board and all line cards.
  • communication anomalies There are three kinds of communication anomalies. Among them, the abnormal communication between the main control board and all line cards is also called an island anomaly.
  • different time periods are set according to different abnormal types for detecting, for example, a preset time period, in which the communication between the line card and other communication nodes is detected, if the time period is detected.
  • the communication between the line card and the other communication node is abnormal, and the abnormality count of the abnormality is accumulated, that is, the number of communication abnormalities of the abnormal type is calculated, for example, the line card communicates with another line card abnormally, before If the line card has an abnormal count of 3, the line card's abnormal count is incremented by one.
  • the processing module 103 is configured to: when the abnormality count reaches a preset threshold, process the communication abnormality corresponding to the abnormality count.
  • the communication abnormality occurs in the communication between the line card and the other communication nodes, the communication abnormality between the main control board and one line card, and the three abnormal types in which the main control board and the line card are abnormal in communication are respectively pre-predicted.
  • the threshold of the abnormality count is preset according to the actual situation. For example, if the normal communication is not affected or the data is guaranteed to be transmitted normally, the threshold may be set larger.
  • the abnormality count of the communication abnormality of a certain abnormal type reaches a preset threshold.
  • the manner of processing the communication abnormality can be customized through the command line.
  • the user can set the line card or the restart time of the main control board through the command line, and set the time when the network element device is turned on or off.
  • the accumulating module includes:
  • the first accumulating unit is configured to: when the communication between the main control board and the line card is abnormal during the predetermined first time period, the first abnormality of the communication abnormality occurs between the main control board and the line card Performing an accumulation process;
  • the second accumulating unit is configured to: when detecting that the line card communicates abnormally with the main control board or other line cards within a predetermined second time period, the line card and the main control board or other line cards The second abnormality count of the communication abnormality is subjected to the accumulation processing;
  • the third accumulating unit is configured to: when an abnormality in communication between the main control board and all line cards is detected within a predetermined third time period, an island abnormality in which communication abnormality occurs between the main control board and all line cards is abnormal Perform an accumulation process.
  • the time period for detecting whether the main control board and the line card are abnormal in communication is the first time period; the time period for detecting whether the line card is abnormal with the main control board or other line cards is the second time period; detecting the main control board
  • the time period during which communication abnormality occurs with all line cards is the third time period.
  • the probability that the line card has a communication abnormality is higher than the probability that the main board has an abnormality. Therefore, the first time period is set to be greater than the second time period, and the probability of an island abnormality occurring on the main control board is the smallest, so , setting the second time period to be greater than the third time period.
  • the processing module includes:
  • a first processing unit configured to: when the first abnormality count reaches a preset first threshold, Sending the main control board to the line card to send a restart command;
  • a second processing unit configured to: when the second abnormality count reaches a preset second threshold, record information of the line card communication abnormality, upload the information to the main control board, and the line card Active restart
  • the third processing unit is configured to: when the island abnormality count reaches a preset island threshold, the main control board is actively restarted.
  • the main control board sends a restart command to the line card to restart the line card, and records the reason for the restart;
  • the abnormal count reaches the preset second threshold, it is considered that the normal communication is affected.
  • the information of the line card communication abnormality is recorded, and the information is uploaded to the main control board, and the line card is actively restarted; when the island abnormality count reaches the preset island threshold.
  • the main control board actively restarts, and the reason for the restart is recorded.
  • the first abnormality count is greater than the second abnormality count
  • the second abnormality count is greater than the island abnormality count
  • the network element device further includes:
  • the clear module 104 is configured to clear the abnormality count when the communication is normal.
  • the abnormality count is not processed.
  • the abnormality count is cleared at this time, and the abnormality count is cleared to include: counting the first abnormality The second abnormal count and the island abnormality count are cleared.
  • the communication abnormality processing method and the network element device of the foregoing technical solution can detect the line card and the main control board in the network element device of the distributed system and other communication nodes in the network element device, when detecting a communication abnormality
  • the abnormality count of the corresponding exception type is accumulated.
  • the line card can be restarted or the main control board can be restarted.
  • the abnormal condition is processed to restore the normal communication in time to ensure the normal transmission of data between the line card and the line card, between the line card and the main control board or between the main control board and the line card, and improve the fault tolerance and reliability of the network element device. Sex and availability. Therefore, the present invention has strong industrial applicability.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Small-Scale Networks (AREA)
  • Maintenance And Management Of Digital Transmission (AREA)

Abstract

一种通信异常处理方法及网元设备,应用于分布式***的网元设备中,所述通信异常处理方法包括以下步骤:对所述网元设备中的线卡及主控板与所述网元设备中的其他通信节点的通信状态进行检测;当在预定的时间段内检测到通信异常时,获取该通信异常的异常类型,对所述异常类型的异常计数进行累加处理;当所述异常计数达到预设的阈值时,对对应异常计数的通信异常进行处理。上述技术方案能够提高网元设备通信的容错性、可靠性及可用性。

Description

通信异常处理方法及网元设备 技术领域
本发明涉及分布式通信技术中的通信异常处理领域,尤其涉及一种通信异常处理方法及网元设备。
背景技术
在分布式***中,网元设备是由一个或多个机盘或机框组成的、能够独立完成一定的传输功能的组合。网元由主控板和线卡组成,主控板负责管理线卡。在网元设备中,主控板是负责控制、管理和配置的单板。有些网元设备只存在一个主控板,有些网元设备中存在两个主控板,一个为主用主控板,一个是备用主控板,备用主控板作为主用主控板的备份。线卡是负责***中的报文的交换,转发和处理的单板。
由于分布式***通信的复杂性,经常会出现主控板或者线卡通信异常,导致线卡与主控板之间的通信中断,此时如果不及时恢复通信,将会进一步导致主控板无法管理其他单板,主控板和其他单板之间无法同步数据等一系列相关故障。
上述内容仅用于辅助理解本发明的技术方案,并不代表承认上述内容是现有技术。
发明内容
本发明的主要目的在于解决网元设备中的主控板和线卡通信发生异常不能及时处理的技术问题。
为解决上述技术问题,采用如下技术方案:
一种通信异常处理方法,应用于分布式***的网元设备中,该方法包括以下步骤:
对所述网元设备中的线卡及主控板与所述网元设备中的其他通信节点的通信状态进行检测;
当在预定的时间段内检测到通信异常时,获取该通信异常的异常类型,对所述异常类型的异常计数进行累加处理;
当所述异常计数达到预设的阈值时,对对应异常计数的通信异常进行处理。
可选地,所述当在预定的时间段内检测到通信异常时,获取该通信异常的异常类型,对所述异常类型的异常计数进行累加处理的步骤包括:
当在预定的第一时间段内检测到所述主控板与一线卡通信异常时,对所述主控板与所述线卡发生通信异常的第一异常计数进行累加处理;
当在预定的第二时间段内检测到所述线卡与所述主控板或者其他线卡通信异常时,对所述线卡与主控板或者其他线卡发生通信异常的第二异常计数进行累加处理;
当在预定的第三时间段内检测到所述主控板与所有的线卡通信异常时,对所述主控板与所有线卡发生通信异常的孤岛异常计数进行累加处理。
可选地,所述当异常计数达到预设的阈值时,对对应异常计数的通信异常进行处理的步骤包括:
当所述第一异常计数达到预设的第一阈值时,将所述主控板向线卡发送重启命令;
当所述第二异常计数达到预设的第二阈值时,记录所述线卡通信异常的信息,上传所述信息至所述主控板并将所述线卡主动重启;
当所述孤岛异常计数达到预设的孤岛阈值时,将所述主控板主动重启。
可选地,所述第一时间段大于所述第二时间段,且所述第二时间段大于所述第三时间段。
可选地,该通信异常处理方法还包括:
当通信正常时,对所述异常计数清零。
一种网元设备,所述网元设备包括检测模块、累加模块和处理模块,其中:
所述检测模块设置成:对所述网元设备中的线卡及主控板与所述网元设备中的其他通信节点的通信状态进行检测;
所述累加模块设置成:当在预定的时间段内检测到通信异常时,获取该通信异常的异常类型,对所述异常类型的异常计数进行累加处理;
所述处理模块设置成:当所述异常计数达到预设的阈值时,对对应异常计数的通信异常进行处理。
可选地,所述累加模块包括第一累加单元、第二累加单元和第三累加单元,其中:
所述第一累加单元设置成:当在预定的第一时间段内检测到所述主控板与一线卡通信异常时,对所述主控板与所述线卡发生通信异常的第一异常计数进行累加处理;
所述第二累加单元设置成:当在预定的第二时间段内检测到所述线卡与所述主控板或者其他线卡通信异常时,对所述线卡与主控板或者其他线卡发生通信异常的第二异常计数进行累加处理;
所述第三累加单元设置成:当在预定的第三时间段内检测到所述主控板与所有的线卡通信异常时,对所述主控板与所有线卡发生通信异常的孤岛异常计数进行累加处理。
可选地,所述处理模块包括第一处理单元、第二处理单元和第三处理单元,其中:
所述第一处理单元设置成:当所述第一异常计数达到预设的第一阈值时,将所述主控板向线卡发送重启命令;
所述第二处理单元设置成:当所述第二异常计数达到预设的第二阈值时,记录所述线卡通信异常的信息,上传所述信息至所述主控板并将所述线卡主动重启;
所述第三处理单元设置成:当所述孤岛异常计数达到预设的孤岛阈值时,将所述主控板主动重启。
可选地,所述第一时间段大于所述第二时间段,且所述第二时间段大于所述第三时间段。
可选地,该网元设备还包括清零模块,其中:
所述清零模块设置成:当通信正常时,对所述异常计数清零。
一种计算机程序,包括程序指令,当该程序指令被网元设备执行时,使得该网元设备可执行权利要求上述的通信异常处理方法。
一种载有权利要求上述计算机程序的载体。
上述技术方案的通信异常处理方法及网元设备,可对分布式***的网元设备中的线卡及主控板与该网元设备中的其他通信节点进行检测,在检测到发生通信异常时,对对应异常类型的异常计数进行累加处理,当异常计数达到预设的阈值时,认为所发生的通信异常影响到正常的通信,此时,可以通过重启线卡或者重启主控板的方式来处理该异常状况,以能够及时恢复正常通信,保证线卡与线卡之间、线卡与主控板或者主控板与线卡之间数据的正常传输,提高网元设备的容错性、可靠性及可用性。
附图概述
图1为本发明第一实施例的通信异常处理方法流程示意图;
图2为本发明第二实施例的通信异常处理方法流程示意图;
图3为本发明第一实施例的网元设备功能模块示意图;
图4为本发明第二实施例的网元设备功能模块示意图。
本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
本发明的较佳实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
本发明提供一种通信异常处理方法,参照图1,在一实施例中,该通信异常处理方法包括:
步骤S101,对所述网元设备中的线卡及主控板与所述网元设备中的其他通信节点的通信状态进行检测;
本实施例中,网元设备中的线卡可以与其他线卡或者主控板进行通信,线卡主要完成转换功能;而主控板可以与线卡通信,实现对线卡的控制、管理或配置等功能。
其中,本实施例的主控板为主用主控板。
其中,在线卡物理在位时,其与其他通信节点可能可以进行正常的通信,也可能不可以进行正常的通信,对于线卡与其他通信节点的通信,本实施例定时地进行检测;另外,对于主控板与其他通信节点的通信,本实施例同样实时地进行检测。
步骤S102,当在预定的时间段内检测到通信异常时,获取该通信异常的异常类型,对所述异常类型的异常计数进行累加处理;
在本实施例中,异常类型包括线卡与其他通信节点(包括其他线卡及主控板)的通信发生通信异常、主控板与一个线卡发生通信异常、以及主控板与所有线卡发生通信异常共三种。其中,主控板与所有线卡发生通信异常的情况也称为孤岛异常。
在本实施例中,根据不同的异常类型设置不同的时间段进行检测,如预设一时间段,在该时间段内对于线卡与其他通信节点的通信进行检测,若在该时间段内检测到线卡与其他通信节点发生通信异常,对发生异常的异常计数进行累加处理,即计算共发生该异常类型的通信异常的次数,如线卡与其他的某一线卡发生通信异常,在此之前该线卡的异常计数为3,则该线卡的异常计数在3的基础上加1。
步骤S103,当所述异常计数达到预设的阈值时,对对应异常计数的通信异常进行处理。
在本实施例中,对上述的线卡与其他通信节点的通信发生通信异常、主控板与一个线卡发生通信异常、以及主控板与所有线卡发生通信异常的三种异常类型分别预设不同的异常计数的最大值作为其阈值。若发生上述三种类型的通信异常情况,在对对应的异常类型的异常计数进行累加后,将异常计数与预设的阈值进行比较,以便对是否需要做出相应的处理进行判断。
其中,异常计数的阈值根据实际情况进行预设,如:若不影响正常的通信或者保证数据能够正常传输的情况下,可以将阈值设置大一些。
在本实施例中,在某一异常类型的通信异常的异常计数达到预设的阈值时,可以通过重启线卡或者重启主控板的方式来进行处理,以及时恢复正常通信,保证线卡与线卡之间、线卡与主控板或者主控板与线卡之间数据的正常传输。
另外,在本实施例中,处理通信异常的方式可以通过命令行进行定制,如用户可以通过命令行设置线卡或者主控板的重启时间,设置网元设备打开或者关闭的时间等。
与现有技术相比,本实施例可对分布式***的网元设备中的线卡及主控板与该网元设备中的其他通信节点进行检测,在检测到发生通信异常时,对对应异常类型的异常计数进行累加处理,当异常计数达到预设的阈值时,认为所发生的通信异常影响到正常的通信,此时,可以通过重启线卡或者重启主控板的方式来处理该异常状况,以能够及时恢复正常通信,保证线卡与线卡之间、线卡与主控板或者主控板与线卡之间数据的正常传输,提高网元设备的容错性、可靠性及可用性。
在一优选的实施例中,在上述图1的实施例的基础上,上述步骤S102包括以下子步骤:
当在预定的第一时间段内检测到所述主控板与一线卡通信异常时,对所述主控板与所述线卡发生通信异常的第一异常计数进行累加处理;
当在预定的第二时间段内检测到所述线卡与所述主控板或者其他线卡通信异常时,对所述线卡与主控板或者其他线卡发生通信异常的第二异常计数进行累加处理;
当在预定的第三时间段内检测到所述主控板与所有的线卡通信异常时,对所述主控板与所有线卡发生通信异常的孤岛异常计数进行累加处理。
其中,检测主控板与线卡是否发生通信异常的时间段为第一时间段;检测线卡与主控板或者其他线卡是否发生通信异常的时间段为第二时间段;检测主控板与所有的线卡是否发生通信异常的时间段为第三时间段。
本实施例中,由于线卡出现通信异常的概率要高于主板卡出现异常的概率,因此,设置第一时间段大于第二时间段,而又由于主控板发生孤岛异常的概率最小,因此,设置第二时间段大于第三时间段。
在一优选的实施例中,在上述图1的实施例的基础上,上述步骤S103包括以下子步骤:
当所述第一异常计数达到预设的第一阈值时,将所述主控板向线卡发送重启命令;
当所述第二异常计数达到预设的第二阈值时,记录所述线卡通信异常的信息,上传所述信息至所述主控板并将所述线卡主动重启;
当所述孤岛异常计数达到预设的孤岛阈值时,将所述主控板主动重启。
本实施例中,若第一异常计数达到预设的第一阈值时,则认为影响到了正常通信,主控板向线卡发送重启命令,使线卡重启,同时记录重启的原因;若第二异常计数达到预设的第二阈值时,也认为影响到了正常通信,此时记录线卡通信异常的信息,上传信息至主控板,线卡主动重启;当孤岛异常计 数达到预设的孤岛阈值时,也认为影响到了正常通信,主控板主动重启,同时记录重启的原因。
值得说明的是,若主用主控板及备用主控板均影响到了正常通信,则令本网元设备重启。
另外,在本实施例中,第一异常计数大于第二异常计数,且第二异常计数大于孤岛异常计数。
在一优选的实施例中,如图2所示,在上述图1的实施例的基础上,本实施例在步骤S101之后包括:
步骤S104,判断是否在预定的时间段内检测到通信异常,若是,则进入步骤S102,否则进入步骤S105;
步骤S102,获取该通信异常的异常类型,对所述异常类型的异常计数进行累加处理;
步骤S103,当所述异常计数达到预设的阈值时,对对应异常计数的通信异常进行处理
步骤S105,不处理异常计数;
步骤S106,当通信正常时,对异常计数清零。
在本实施例中,步骤S101、步骤S102及步骤S103与图1的实施例相同。
在本实施例中,若在预定的时间段内没有检测到通信异常时,不处理异常计数。当通信正常,此时对异常计数清零,对异常计数清零包括:对第一异常计数、第二异常计数及孤岛异常计数清零。
本发明提供一种网元设备,参照图3,在一实施例中,该网元设备包括:
检测模块101,设置成:对所述网元设备中的线卡及主控板与所述网元设备中的其他通信节点的通信状态进行检测;
本实施例中,网元设备中的线卡可以与其他线卡或者主控板进行通信,线卡主要完成转换功能;而主控板可以与线卡通信,实现对线卡的控制、管 理或配置等功能。
其中,本实施例的主控板为主用主控板。
其中,在线卡物理在位时,其与其他通信节点可能可以进行正常的通信,也可能不可以进行正常的通信,对于线卡与其他通信节点的通信,本实施例定时地进行检测;另外,对于主控板与其他通信节点的通信,本实施例同样实时地进行检测。
累加模块102,设置成:当在预定的时间段内检测到通信异常时,获取该通信异常的异常类型,对所述异常类型的异常计数进行累加处理;
在本实施例中,异常类型包括线卡与其他通信节点(包括其他线卡及主控板)的通信发生通信异常、主控板与一个线卡发生通信异常、以及主控板与所有线卡发生通信异常共三种。其中,主控板与所有线卡发生通信异常的情况也称为孤岛异常。
在本实施例中,根据不同的异常类型设置不同的时间段进行检测,如预设一时间段,在该时间段内对于线卡与其他通信节点的通信进行检测,若在该时间段内检测到线卡与其他通信节点发生通信异常,对发生异常的异常计数进行累加处理,即计算共发生该异常类型的通信异常的次数,如线卡与其他的某一线卡发生通信异常,在此之前该线卡的异常计数为3,则该线卡的异常计数在3的基础上加1。
处理模块103,设置成:当所述异常计数达到预设的阈值时,对对应异常计数的通信异常进行处理。
在本实施例中,对上述的线卡与其他通信节点的通信发生通信异常、主控板与一个线卡发生通信异常、以及主控板与所有线卡发生通信异常的三种异常类型分别预设不同的异常计数的最大值作为其阈值。若发生上述三种类型的通信异常情况,在对对应的异常类型的异常计数进行累加后,将异常计数与预设的阈值进行比较,以便对是否需要做出相应的处理进行判断。
其中,异常计数的阈值根据实际情况进行预设,如:若不影响正常的通信或者保证数据能够正常传输的情况下,可以将阈值设置大一些。
在本实施例中,在某一异常类型的通信异常的异常计数达到预设的阈值 时,可以通过重启线卡或者重启主控板的方式来进行处理,以及时恢复正常通信,保证线卡与线卡之间、线卡与主控板或者主控板与线卡之间数据的正常传输。
另外,在本实施例中,处理通信异常的方式可以通过命令行进行定制,如用户可以通过命令行设置线卡或者主控板的重启时间,设置网元设备打开或者关闭的时间等。
在一优选的实施例中,在上述图3的实施例的基础上,上述累加模块包括:
第一累加单元,设置成:当在预定的第一时间段内检测到所述主控板与一线卡通信异常时,对所述主控板与所述线卡发生通信异常的第一异常计数进行累加处理;
第二累加单元,设置成:当在预定的第二时间段内检测到所述线卡与所述主控板或者其他线卡通信异常时,对所述线卡与主控板或者其他线卡发生通信异常的第二异常计数进行累加处理;
第三累加单元,设置成:当在预定的第三时间段内检测到所述主控板与所有的线卡通信异常时,对所述主控板与所有线卡发生通信异常的孤岛异常计数进行累加处理。
其中,检测主控板与线卡是否发生通信异常的时间段为第一时间段;检测线卡与主控板或者其他线卡是否发生通信异常的时间段为第二时间段;检测主控板与所有的线卡是否发生通信异常的时间段为第三时间段。
本实施例中,由于线卡出现通信异常的概率要高于主板卡出现异常的概率,因此,设置第一时间段大于第二时间段,而又由于主控板发生孤岛异常的概率最小,因此,设置第二时间段大于第三时间段。
在一优选的实施例中,在上述图3的实施例的基础上,上述处理模块包括:
第一处理单元,设置成:当所述第一异常计数达到预设的第一阈值时, 将所述主控板向线卡发送重启命令;
第二处理单元,设置成:当所述第二异常计数达到预设的第二阈值时,记录所述线卡通信异常的信息,上传所述信息至所述主控板并将所述线卡主动重启;
第三处理单元,设置成:当所述孤岛异常计数达到预设的孤岛阈值时,将所述主控板主动重启。
本实施例中,若第一异常计数达到预设的第一阈值时,则认为影响到了正常通信,主控板向线卡发送重启命令,使线卡重启,同时记录重启的原因;若第二异常计数达到预设的第二阈值时,也认为影响到了正常通信,此时记录线卡通信异常的信息,上传信息至主控板,线卡主动重启;当孤岛异常计数达到预设的孤岛阈值时,也认为影响到了正常通信,主控板主动重启,同时记录重启的原因。
值得说明的是,若主用主控板及备用主控板均影响到了正常通信,则令本网元设备重启。
另外,在本实施例中,第一异常计数大于第二异常计数,且第二异常计数大于孤岛异常计数。
在一优选的实施例中,如图4所示,在上述图3的实施例的基础上,该网元设备还包括:
清零模块104,设置成:当通信正常时,对所述异常计数清零。
在本实施例中,若在预定的时间段内没有检测到通信异常时,不处理异常计数,当通信正常时,此时对异常计数清零,对异常计数清零包括:对第一异常计数、第二异常计数及孤岛异常计数清零。
以上仅为本发明的优选实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。
工业实用性
上述技术方案的通信异常处理方法及网元设备,可对分布式***的网元设备中的线卡及主控板与该网元设备中的其他通信节点进行检测,在检测到发生通信异常时,对对应异常类型的异常计数进行累加处理,当异常计数达到预设的阈值时,认为所发生的通信异常影响到正常的通信,此时,可以通过重启线卡或者重启主控板的方式来处理该异常状况,以能够及时恢复正常通信,保证线卡与线卡之间、线卡与主控板或者主控板与线卡之间数据的正常传输,提高网元设备的容错性、可靠性及可用性。因此本发明具有很强的工业实用性。

Claims (12)

  1. 一种通信异常处理方法,应用于分布式***的网元设备中,该方法包括以下步骤:
    对所述网元设备中的线卡及主控板与所述网元设备中的其他通信节点的通信状态进行检测;
    当在预定的时间段内检测到通信异常时,获取该通信异常的异常类型,对所述异常类型的异常计数进行累加处理;
    当所述异常计数达到预设的阈值时,对对应异常计数的通信异常进行处理。
  2. 如权利要求1所述的通信异常处理方法,其中,所述当在预定的时间段内检测到通信异常时,获取该通信异常的异常类型,对所述异常类型的异常计数进行累加处理的步骤包括:
    当在预定的第一时间段内检测到所述主控板与一线卡通信异常时,对所述主控板与所述线卡发生通信异常的第一异常计数进行累加处理;
    当在预定的第二时间段内检测到所述线卡与所述主控板或者其他线卡通信异常时,对所述线卡与主控板或者其他线卡发生通信异常的第二异常计数进行累加处理;
    当在预定的第三时间段内检测到所述主控板与所有的线卡通信异常时,对所述主控板与所有线卡发生通信异常的孤岛异常计数进行累加处理。
  3. 如权利要求2所述的通信异常处理方法,其中,所述当异常计数达到预设的阈值时,对对应异常计数的通信异常进行处理的步骤包括:
    当所述第一异常计数达到预设的第一阈值时,将所述主控板向线卡发送重启命令;
    当所述第二异常计数达到预设的第二阈值时,记录所述线卡通信异常的信息,上传所述信息至所述主控板并将所述线卡主动重启;
    当所述孤岛异常计数达到预设的孤岛阈值时,将所述主控板主动重启。
  4. 如权利要求2所述的通信异常处理方法,其中,
    所述第一时间段大于所述第二时间段,且所述第二时间段大于所述第三时间段。
  5. 如权利要求1至4任一项所述的通信异常处理方法,该通信异常处理方法还包括:
    当通信正常时,对所述异常计数清零。
  6. 一种网元设备,所述网元设备包括检测模块、累加模块和处理模块,其中:
    所述检测模块设置成:对所述网元设备中的线卡及主控板与所述网元设备中的其他通信节点的通信状态进行检测;
    所述累加模块设置成:当在预定的时间段内检测到通信异常时,获取该通信异常的异常类型,对所述异常类型的异常计数进行累加处理;
    所述处理模块设置成:当所述异常计数达到预设的阈值时,对对应异常计数的通信异常进行处理。
  7. 如权利要求6所述的网元设备,其中,所述累加模块包括第一累加单元、第二累加单元和第三累加单元,其中:
    所述第一累加单元设置成:当在预定的第一时间段内检测到所述主控板与一线卡通信异常时,对所述主控板与所述线卡发生通信异常的第一异常计数进行累加处理;
    所述第二累加单元设置成:当在预定的第二时间段内检测到所述线卡与所述主控板或者其他线卡通信异常时,对所述线卡与主控板或者其他线卡发生通信异常的第二异常计数进行累加处理;
    所述第三累加单元设置成:当在预定的第三时间段内检测到所述主控板与所有的线卡通信异常时,对所述主控板与所有线卡发生通信异常的孤岛异常计数进行累加处理。
  8. 如权利要求7所述的网元设备,其中,所述处理模块包括第一处理单元、第二处理单元和第三处理单元,其中:
    所述第一处理单元设置成:当所述第一异常计数达到预设的第一阈值时,将所述主控板向线卡发送重启命令;
    所述第二处理单元设置成:当所述第二异常计数达到预设的第二阈值时,记录所述线卡通信异常的信息,上传所述信息至所述主控板并将所述线卡主动重启;
    所述第三处理单元设置成:当所述孤岛异常计数达到预设的孤岛阈值时,将所述主控板主动重启。
  9. 如权利要求7所述的网元设备,其中,所述第一时间段大于所述第二时间段,且所述第二时间段大于所述第三时间段。
  10. 如权利要求6至9任一项所述的网元设备,该网元设备还包括清零模块,其中:
    所述清零模块设置成:当通信正常时,对所述异常计数清零。
  11. 一种计算机程序,包括程序指令,当该程序指令被网元设备执行时,使得该网元设备可执行权利要求1-5中任一项所述的通信异常处理方法。
  12. 一种载有权利要求11所述计算机程序的载体。
PCT/CN2014/089212 2014-07-18 2014-10-22 通信异常处理方法及网元设备 WO2015117395A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410345157.8 2014-07-18
CN201410345157.8A CN104836679B (zh) 2014-07-18 2014-07-18 通信异常处理方法及网元设备

Publications (1)

Publication Number Publication Date
WO2015117395A1 true WO2015117395A1 (zh) 2015-08-13

Family

ID=53777213

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/089212 WO2015117395A1 (zh) 2014-07-18 2014-10-22 通信异常处理方法及网元设备

Country Status (2)

Country Link
CN (1) CN104836679B (zh)
WO (1) WO2015117395A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113965447A (zh) * 2020-07-20 2022-01-21 广东芬尼克兹节能设备有限公司 一种在线云诊断方法、装置、***、设备及存储介质
WO2024119777A1 (zh) * 2022-12-06 2024-06-13 中兴通讯股份有限公司 框式设备通信链路异常处理方法、框式设备及介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107465453A (zh) * 2016-06-02 2017-12-12 中兴通讯股份有限公司 一种光网络终端及其工作方法、及通信***
CN107995018A (zh) * 2016-10-27 2018-05-04 迈普通信技术股份有限公司 故障检测方法、lpu和分布式网络通信设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1062063A (zh) * 1990-11-30 1992-06-17 株式会社田村电机制作所 公共电话站
CN101110857A (zh) * 2007-08-28 2008-01-23 中兴通讯股份有限公司 单板复位的监测方法
CN101788817A (zh) * 2010-01-29 2010-07-28 航天东方红卫星有限公司 一种基于星载总线的故障识别与处理方法
CN102122843A (zh) * 2011-02-23 2011-07-13 江西省电力科学研究院 关口电能计量装置运行异常预警技术
CN102201934A (zh) * 2011-05-05 2011-09-28 中国联合网络通信集团有限公司 集中管理分布式自动测试巡检方法及***

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101145986B (zh) * 2007-06-18 2011-04-20 中兴通讯股份有限公司 一种通讯***中板间通信状态检测方法
CN102025558A (zh) * 2009-09-11 2011-04-20 鸿富锦精密工业(深圳)有限公司 网络侦测设备及其主动侦测网络品质的方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1062063A (zh) * 1990-11-30 1992-06-17 株式会社田村电机制作所 公共电话站
CN101110857A (zh) * 2007-08-28 2008-01-23 中兴通讯股份有限公司 单板复位的监测方法
CN101788817A (zh) * 2010-01-29 2010-07-28 航天东方红卫星有限公司 一种基于星载总线的故障识别与处理方法
CN102122843A (zh) * 2011-02-23 2011-07-13 江西省电力科学研究院 关口电能计量装置运行异常预警技术
CN102201934A (zh) * 2011-05-05 2011-09-28 中国联合网络通信集团有限公司 集中管理分布式自动测试巡检方法及***

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113965447A (zh) * 2020-07-20 2022-01-21 广东芬尼克兹节能设备有限公司 一种在线云诊断方法、装置、***、设备及存储介质
CN113965447B (zh) * 2020-07-20 2023-07-21 广东芬尼克兹节能设备有限公司 一种在线云诊断方法、装置、***、设备及存储介质
WO2024119777A1 (zh) * 2022-12-06 2024-06-13 中兴通讯股份有限公司 框式设备通信链路异常处理方法、框式设备及介质

Also Published As

Publication number Publication date
CN104836679B (zh) 2018-11-23
CN104836679A (zh) 2015-08-12

Similar Documents

Publication Publication Date Title
WO2015117395A1 (zh) 通信异常处理方法及网元设备
US9875165B2 (en) Communication bus with baseboard management controller
WO2015188579A1 (zh) 分布式虚拟防火墙装置、方法及防火墙控制器
CN107769943B (zh) 一种主备集群切换的方法和设备
CN108418710B (zh) 一种分布式监控***、方法及装置
JP5549556B2 (ja) データ収集システム、データ収集システムの異常要因判定方法
CN110830283B (zh) 故障检测方法、装置、设备和***
WO2015131548A1 (zh) 保护bfd的方法和设备
US7430688B2 (en) Network monitoring method and apparatus
CN110017994B (zh) 自动驾驶车辆的异常检测方法、装置、***、设备及介质
CN104782082A (zh) 用于控制器局域网总线处理的新方法
CN109496401B (zh) 一种业务接管方法、存储设备和业务接管装置
CN103166778A (zh) 一种故障自动化智能处理方法及其装置
CN113708995A (zh) 一种网络故障诊断方法、***、电子设备及存储介质
CN101777951B (zh) 一种数据监测的方法和***
CN104065526A (zh) 一种服务器故障报警的方法和装置
CN104486112A (zh) 一种数据帧的接收处理方法
US10721135B1 (en) Edge computing system for monitoring and maintaining data center operations
WO2020078355A1 (zh) 设备状态监控方法及装置
US20190386912A1 (en) Application-aware links
US20180239668A1 (en) Systems and methods for an error logging mechanism at controller area network buses
CN109271270A (zh) 存储***中底层硬件的故障排除方法、***及相关装置
JP2019036862A5 (ja) サーバ装置、記録方法、プログラム、および記録システム、並びに情報処理装置
CN116737462A (zh) 一种数据处理方法、***、装置及介质
JP2014225179A (ja) ログ取得装置、ログ取得方法およびログ取得プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14882082

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14882082

Country of ref document: EP

Kind code of ref document: A1