CN113794595A - IoT (Internet of things) equipment high-availability method based on industrial Internet - Google Patents

IoT (Internet of things) equipment high-availability method based on industrial Internet Download PDF

Info

Publication number
CN113794595A
CN113794595A CN202111081578.0A CN202111081578A CN113794595A CN 113794595 A CN113794595 A CN 113794595A CN 202111081578 A CN202111081578 A CN 202111081578A CN 113794595 A CN113794595 A CN 113794595A
Authority
CN
China
Prior art keywords
gateway
cluster
gateways
edge
high availability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111081578.0A
Other languages
Chinese (zh)
Inventor
胡金丽
畅思衡
岳鹏宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lingyun Youyi Beijing Technology Co ltd
Original Assignee
Lingyun Youyi Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lingyun Youyi Beijing Technology Co ltd filed Critical Lingyun Youyi Beijing Technology Co ltd
Priority to CN202111081578.0A priority Critical patent/CN113794595A/en
Publication of CN113794595A publication Critical patent/CN113794595A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Cardiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer And Data Communications (AREA)

Abstract

According to the IoT equipment high-availability method based on the industrial Internet, the distributed multi-gateway cluster management service constructed by the Raft algorithm is adopted, so that the service stability is ensured, and meanwhile, the service data processing capacity is improved; only the minimum gateway number of the cluster for the stability is ensured, the new gateway can be freely supported to join/exit the cluster, and therefore, even if a single point or multiple points simultaneously have problems, the stability of service processing can still be ensured. Meanwhile, when the data processing performance is bottleneck, a new gateway is added, and the overall processing and response capacity of the service is rapidly improved.

Description

IoT (Internet of things) equipment high-availability method based on industrial Internet
Technical Field
The invention relates to the field of Internet of things, in particular to an IoT (Internet of things) equipment high-availability method based on an industrial Internet.
Background
With the rapid development of 5G and the internet of things, intelligent IoT gateway devices have been applied to more and more industries. For a general civil internet of things device of the smart home type, the IoT gateway has relatively low requirements on availability. Because the operating environment of the civil equipment is better, the temperature, the humidity and the current and the voltage are more stable.
In a large number of industrial fields, the environment in which the equipment operates is harsh, such as high temperature, high humidity, and large current-voltage variation, and there are also clear requirements for the operating state of the equipment in the industrial fields. The out-of-service time of equipment is not as critical as for civilian equipment. Without highly available equipment, service is interrupted, traffic data is lost, and even lost to industrial operations. In industrial risk-oriented design, an industrial grade IoT device high availability method based on the industrial internet is very important, and the purpose of the method is to improve the availability of the IoT device and resist service interruption caused by various risks.
Existing IoT internet of things smart gateway device manufacturers typically only work around the operational stability of the devices. For example, the durability of the operation environment such as wide temperature and wide pressure is provided, and the operation stability of the equipment is improved. But the problems that data cannot be collected and transmitted back after equipment failure, services cannot respond, data loss is caused and the like still cannot be solved; only can wait for maintenance, and the maintenance is frequent and the maintenance recovery degree can not be evaluated; that is, a single device still cannot guarantee the stability requirement of the industrial IoT in service. Meanwhile, with the acceleration of industrialized digital transformation, a single edge gateway often has a bottleneck of data calculation and processing, and the prediction and response of the whole service are influenced.
Disclosure of Invention
In view of the above, the present invention is proposed in order to provide a highly available method of industrial internet based IoT devices that overcomes or at least partially solves the above mentioned problems.
According to an aspect of the present invention, there is provided an industrial internet based IoT device high availability method, including:
the method comprises the steps that the Raft cluster comprises a plurality of gateways, each gateway is in one of three states, namely a main gateway, an election gateway and a candidate gateway, in any determined time period, and the three states can be converted with each other;
when the main service heartbeat information is not received in the whole cluster, one gateway changes the state into a candidate;
the candidate gateway sends a rising main vote to other gateways;
after half voting, the candidate gateway is successfully upgraded to a main gateway server in the cluster;
the edge gateways are responsible for acquiring data in the new energy equipment and sending the data to a data queue, whether the gateways in the cluster normally provide acquisition services or not is judged through heartbeat monitoring capacity among the edge gateways, and if yes, the edge gateways work normally; otherwise, other gateways work instead of the failed gateway.
Optionally, the method for high availability of the device further includes:
newly adding an edge gateway to the cluster, and performing gateway registration operation through a configuration gateway;
after initiating a registration request, the gateway will perform the check once every 10 seconds for three times, if the state fails, the gateway returns registration failure information and waits for re-registration; if the state is successful, the cluster distributes a data acquisition task to the newly registered gateway by taking the current acquisition load as the weight, and the distribution operation is completed.
Optionally, the method for high availability of the device further includes: and (3) alarming the fault of the edge gateway, when the edge gateway has a problem, checking by other gateways in the cluster, recording the problem and giving an alarm after the fault is found.
The invention provides an industrial internet-based IoT (Internet of things) equipment high-availability method, which comprises the following steps: the method comprises the steps that the Raft cluster comprises a plurality of gateways, each gateway is in one of three states, namely a main gateway, an election gateway and a candidate gateway, in any determined time period, and the three states can be converted with each other; when the main service heartbeat information is not received in the whole cluster, one gateway changes the state into a candidate; the candidate gateway sends a rising main vote to other gateways; after half voting, the candidate gateway is successfully upgraded to a main gateway server in the cluster; the edge gateways are responsible for acquiring data in the new energy equipment and sending the data to a data queue, whether the gateways in the cluster normally provide acquisition services or not is judged through heartbeat monitoring capacity among the edge gateways, and if yes, the edge gateways work normally; otherwise, other gateways work instead of the failed gateway. The method can freely support the new gateway to join or leave the cluster, thereby ensuring the stability of service processing and rapidly improving the overall processing and response capability of the service even if a single point or multiple points simultaneously have problems.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of a highly available election based on the Raft algorithm according to an embodiment of the present invention;
fig. 2 is a schematic diagram of an edge gateway failover according to an embodiment of the present invention;
fig. 3 is a schematic diagram of an edge gateway failover finite state according to an embodiment of the present invention;
fig. 4 is a schematic diagram of an edge gateway newly added to a cluster according to an embodiment of the present invention;
fig. 5 is a diagram of a finite state machine for adding an edge gateway to a cluster according to an embodiment of the present invention;
fig. 6 is a schematic diagram of an edge gateway failure alarm provided in an embodiment of the present invention;
fig. 7 is a schematic diagram of an edge gateway fault alarm finite state machine according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The terms "comprises" and "comprising," and any variations thereof, in the present description and claims and drawings are intended to cover a non-exclusive inclusion, such as a list of steps or elements.
The technical solution of the present invention is further described in detail with reference to the accompanying drawings and embodiments.
As shown in fig. 1, the Raft cluster contains multiple gateways, allowing the system to tolerate one failure. At any given time, each gateway is in one of three states, primary, election or candidate. These several states can be switched over to each other.
Raft uses a heartbeat mechanism to trigger the primary gateway election. When the gateways start up, they start with the identity of the voter. The gateway will maintain the election status as long as it receives a valid request from the primary gateway or candidate. The primary gateway sends a periodic heartbeat to all voters to maintain their privileges. If an election does not receive any communication within a period of time called an election timeout, the election considers that there is no viable primary gateway and begins to elect a new primary gateway.
As shown in fig. 2, the edge gateway performs failover, and the edge gateway is responsible for acquiring data in the new energy device and pushing the data to the data queue. And judging whether the gateways in the cluster normally provide acquisition service or not between the edge gateways through the heartbeat monitoring capability. When a certain running data acquisition gateway hardware in the cluster breaks down, the problem of link communication disconnection with new energy equipment occurs, and data cannot be acquired. At the moment, other gateways in the cluster can work in place of the failed gateway immediately, so that normal provision of data acquisition services is ensured.
As shown in fig. 3, an edge gateway failover finite state machine diagram, where devices in an edge gateway cluster have a heartbeat check mechanism, and gateways in the cluster detect the survival state of the gateway every 5 seconds, and when a problem occurs in a gateway device in the cluster, one gateway in the cluster takes over the data acquisition task of the failed gateway. And after the completion of the succession, the gateway equipment enters the state checking mode again to ensure the normal operation of the data acquisition task.
As shown in fig. 4 and 5, when a new edge gateway joins the cluster, a gateway registration operation is first performed by configuring the gateway. After initiating the registration request, the gateway will check three times every 10 seconds. If the state fails, returning registration failure information and waiting for re-registration; if the state is successful, the cluster allocates a data acquisition task to the newly registered gateway by taking the current acquisition load as the weight, and completes the allocation operation, namely, the data acquisition task of the new energy equipment is started.
As shown in fig. 6, edge gateway failure alarm, as shown in fig. 7, edge gateway failure alarm finite state machine diagram. And (3) alarming the fault of the edge gateway, when the edge gateway has a problem, checking by other gateways in the cluster, recording the problem and giving an alarm after the fault is found.
Has the advantages that: the technical architecture aims at controlling risks and protecting the value of a company; an edge gateway cluster high-availability switching scheme based on the Raft algorithm coding; fault and switched automated problem notification; and (4) computing power improvement mode of the cluster.
According to the IoT equipment high-availability method based on the industrial Internet, the distributed multi-gateway cluster management service constructed by the Raft algorithm is adopted, so that the service stability is ensured, and meanwhile, the service data processing capacity is improved; only the minimum gateway number of the cluster for the stability is ensured, the new gateway can be freely supported to join/exit the cluster, and therefore, even if a single point or multiple points simultaneously have problems, the stability of service processing can still be ensured. Meanwhile, when the data processing performance is bottleneck, a new gateway is added, and the overall processing and response capacity of the service is rapidly improved.
The above embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, it should be understood that the above embodiments are merely exemplary embodiments of the present invention and are not intended to limit the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (3)

1. An industrial internet based IoT device high availability method, wherein the device high availability method comprises:
the method comprises the steps that the Raft cluster comprises a plurality of gateways, each gateway is in one of three states, namely a main gateway, an election gateway and a candidate gateway, in any determined time period, and the three states can be converted with each other;
when the main service heartbeat information is not received in the whole cluster, one gateway changes the state into a candidate;
the candidate gateway sends a rising main vote to other gateways;
after half voting, the candidate gateway is successfully upgraded to a main gateway server in the cluster;
the edge gateways are responsible for acquiring data in the new energy equipment and sending the data to a data queue, whether the gateways in the cluster normally provide acquisition services or not is judged through heartbeat monitoring capacity among the edge gateways, and if yes, the edge gateways work normally; otherwise, other gateways work instead of the failed gateway.
2. The industrial internet-based IoT device high availability method in claim 1, wherein the device high availability method further comprises:
newly adding an edge gateway to the cluster, and performing gateway registration operation through a configuration gateway;
after initiating a registration request, the gateway will perform the check once every 10 seconds for three times, if the state fails, the gateway returns registration failure information and waits for re-registration; if the state is successful, the cluster distributes a data acquisition task to the newly registered gateway by taking the current acquisition load as the weight, and the distribution operation is completed.
3. The industrial internet-based IoT device high availability method in claim 1, wherein the device high availability method further comprises: and (3) alarming the fault of the edge gateway, when the edge gateway has a problem, checking by other gateways in the cluster, recording the problem and giving an alarm after the fault is found.
CN202111081578.0A 2021-09-15 2021-09-15 IoT (Internet of things) equipment high-availability method based on industrial Internet Pending CN113794595A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111081578.0A CN113794595A (en) 2021-09-15 2021-09-15 IoT (Internet of things) equipment high-availability method based on industrial Internet

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111081578.0A CN113794595A (en) 2021-09-15 2021-09-15 IoT (Internet of things) equipment high-availability method based on industrial Internet

Publications (1)

Publication Number Publication Date
CN113794595A true CN113794595A (en) 2021-12-14

Family

ID=78878479

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111081578.0A Pending CN113794595A (en) 2021-09-15 2021-09-15 IoT (Internet of things) equipment high-availability method based on industrial Internet

Country Status (1)

Country Link
CN (1) CN113794595A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114500554A (en) * 2022-02-09 2022-05-13 南京戎光软件科技有限公司 Internet of things system management method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095008A (en) * 2015-08-25 2015-11-25 国电南瑞科技股份有限公司 Distributed task fault redundancy method suitable for cluster system
CN107453929A (en) * 2017-09-22 2017-12-08 中国联合网络通信集团有限公司 Group system is from construction method, device and group system
CN108847982A (en) * 2018-06-26 2018-11-20 郑州云海信息技术有限公司 A kind of distributed storage cluster and its node failure switching method and apparatus
CN109361532A (en) * 2018-09-11 2019-02-19 上海天旦网络科技发展有限公司 The high-availability system and method and computer readable storage medium of network data analysis
CN112087333A (en) * 2020-09-07 2020-12-15 上海浦东发展银行股份有限公司 Micro-service registration center cluster and information processing method thereof
CN112202601A (en) * 2020-09-23 2021-01-08 湖南麒麟信安科技股份有限公司 Application method of two physical node mongo clusters operated in duplicate set mode

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095008A (en) * 2015-08-25 2015-11-25 国电南瑞科技股份有限公司 Distributed task fault redundancy method suitable for cluster system
CN107453929A (en) * 2017-09-22 2017-12-08 中国联合网络通信集团有限公司 Group system is from construction method, device and group system
CN108847982A (en) * 2018-06-26 2018-11-20 郑州云海信息技术有限公司 A kind of distributed storage cluster and its node failure switching method and apparatus
CN109361532A (en) * 2018-09-11 2019-02-19 上海天旦网络科技发展有限公司 The high-availability system and method and computer readable storage medium of network data analysis
CN112087333A (en) * 2020-09-07 2020-12-15 上海浦东发展银行股份有限公司 Micro-service registration center cluster and information processing method thereof
CN112202601A (en) * 2020-09-23 2021-01-08 湖南麒麟信安科技股份有限公司 Application method of two physical node mongo clusters operated in duplicate set mode

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114500554A (en) * 2022-02-09 2022-05-13 南京戎光软件科技有限公司 Internet of things system management method
CN114500554B (en) * 2022-02-09 2024-04-26 南京戎光软件科技有限公司 Internet of things system management method

Similar Documents

Publication Publication Date Title
US6934880B2 (en) Functional fail-over apparatus and method of operation thereof
CN112181660A (en) High-availability method based on server cluster
CN107508694B (en) Node management method and node equipment in cluster
CN102916825A (en) Management equipment of dual-computer hot standby system, management method and dual-computer hot standby system
CN104717077B (en) A kind of method, apparatus and system for managing data center
US11848889B2 (en) Systems and methods for improved uptime for network devices
CN113794595A (en) IoT (Internet of things) equipment high-availability method based on industrial Internet
CN108445857B (en) Design method for 1+ N redundancy mechanism of SCADA system
CN111309515A (en) Disaster recovery control method, device and system
CN107491344B (en) Method and device for realizing high availability of virtual machine
CN117435405A (en) Dual hot standby and failover system and method
CN112260893A (en) Ethernet redundancy device of VxWorks operating system based on network heartbeat
CN110677288A (en) Edge computing system and method generally used for multi-scene deployment
US11153769B2 (en) Network fault discovery
CN110675614A (en) Transmission method of power monitoring data
JP3910967B2 (en) Duplex system and multiplexing control method
CN101958925A (en) Method and device for controlling remote equipment
CN111654401B (en) Network segment switching method, device, terminal and storage medium of monitoring system
CN115499294A (en) Distributed storage environment network sub-health detection and fault automatic processing method
CN111586110B (en) Optimization processing method for raft in point-to-point fault
US11954509B2 (en) Service continuation system and service continuation method between active and standby virtual servers
KR20130042438A (en) Method and apparatus for managing rfid resource
CN111064608A (en) Master-slave switching method and device of message system, electronic equipment and storage medium
CN107547257B (en) Server cluster implementation method and device
KR102517831B1 (en) Method and system for managing software in mission critical system environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination