CN113794595A

CN113794595A - IoT (Internet of things) equipment high-availability method based on industrial Internet

Info

Publication number: CN113794595A
Application number: CN202111081578.0A
Authority: CN
Inventors: 胡金丽; 畅思衡; 岳鹏宇
Original assignee: Lingyun Youyi Beijing Technology Co ltd
Current assignee: Lingyun Youyi Beijing Technology Co ltd
Priority date: 2021-09-15
Filing date: 2021-09-15
Publication date: 2021-12-14

Abstract

According to the IoT equipment high-availability method based on the industrial Internet, the distributed multi-gateway cluster management service constructed by the Raft algorithm is adopted, so that the service stability is ensured, and meanwhile, the service data processing capacity is improved; only the minimum gateway number of the cluster for the stability is ensured, the new gateway can be freely supported to join/exit the cluster, and therefore, even if a single point or multiple points simultaneously have problems, the stability of service processing can still be ensured. Meanwhile, when the data processing performance is bottleneck, a new gateway is added, and the overall processing and response capacity of the service is rapidly improved.

Description

IoT (Internet of things) equipment high-availability method based on industrial Internet

Technical Field

The invention relates to the field of Internet of things, in particular to an IoT (Internet of things) equipment high-availability method based on an industrial Internet.

Background

With the rapid development of 5G and the internet of things, intelligent IoT gateway devices have been applied to more and more industries. For a general civil internet of things device of the smart home type, the IoT gateway has relatively low requirements on availability. Because the operating environment of the civil equipment is better, the temperature, the humidity and the current and the voltage are more stable.

In a large number of industrial fields, the environment in which the equipment operates is harsh, such as high temperature, high humidity, and large current-voltage variation, and there are also clear requirements for the operating state of the equipment in the industrial fields. The out-of-service time of equipment is not as critical as for civilian equipment. Without highly available equipment, service is interrupted, traffic data is lost, and even lost to industrial operations. In industrial risk-oriented design, an industrial grade IoT device high availability method based on the industrial internet is very important, and the purpose of the method is to improve the availability of the IoT device and resist service interruption caused by various risks.

Existing IoT internet of things smart gateway device manufacturers typically only work around the operational stability of the devices. For example, the durability of the operation environment such as wide temperature and wide pressure is provided, and the operation stability of the equipment is improved. But the problems that data cannot be collected and transmitted back after equipment failure, services cannot respond, data loss is caused and the like still cannot be solved; only can wait for maintenance, and the maintenance is frequent and the maintenance recovery degree can not be evaluated; that is, a single device still cannot guarantee the stability requirement of the industrial IoT in service. Meanwhile, with the acceleration of industrialized digital transformation, a single edge gateway often has a bottleneck of data calculation and processing, and the prediction and response of the whole service are influenced.

Disclosure of Invention

In view of the above, the present invention is proposed in order to provide a highly available method of industrial internet based IoT devices that overcomes or at least partially solves the above mentioned problems.

According to an aspect of the present invention, there is provided an industrial internet based IoT device high availability method, including:

the method comprises the steps that the Raft cluster comprises a plurality of gateways, each gateway is in one of three states, namely a main gateway, an election gateway and a candidate gateway, in any determined time period, and the three states can be converted with each other;

when the main service heartbeat information is not received in the whole cluster, one gateway changes the state into a candidate;

the candidate gateway sends a rising main vote to other gateways;

after half voting, the candidate gateway is successfully upgraded to a main gateway server in the cluster;

the edge gateways are responsible for acquiring data in the new energy equipment and sending the data to a data queue, whether the gateways in the cluster normally provide acquisition services or not is judged through heartbeat monitoring capacity among the edge gateways, and if yes, the edge gateways work normally; otherwise, other gateways work instead of the failed gateway.

Optionally, the method for high availability of the device further includes:

newly adding an edge gateway to the cluster, and performing gateway registration operation through a configuration gateway;

after initiating a registration request, the gateway will perform the check once every 10 seconds for three times, if the state fails, the gateway returns registration failure information and waits for re-registration; if the state is successful, the cluster distributes a data acquisition task to the newly registered gateway by taking the current acquisition load as the weight, and the distribution operation is completed.

Optionally, the method for high availability of the device further includes: and (3) alarming the fault of the edge gateway, when the edge gateway has a problem, checking by other gateways in the cluster, recording the problem and giving an alarm after the fault is found.

The invention provides an industrial internet-based IoT (Internet of things) equipment high-availability method, which comprises the following steps: the method comprises the steps that the Raft cluster comprises a plurality of gateways, each gateway is in one of three states, namely a main gateway, an election gateway and a candidate gateway, in any determined time period, and the three states can be converted with each other; when the main service heartbeat information is not received in the whole cluster, one gateway changes the state into a candidate; the candidate gateway sends a rising main vote to other gateways; after half voting, the candidate gateway is successfully upgraded to a main gateway server in the cluster; the edge gateways are responsible for acquiring data in the new energy equipment and sending the data to a data queue, whether the gateways in the cluster normally provide acquisition services or not is judged through heartbeat monitoring capacity among the edge gateways, and if yes, the edge gateways work normally; otherwise, other gateways work instead of the failed gateway. The method can freely support the new gateway to join or leave the cluster, thereby ensuring the stability of service processing and rapidly improving the overall processing and response capability of the service even if a single point or multiple points simultaneously have problems.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow chart of a highly available election based on the Raft algorithm according to an embodiment of the present invention;

fig. 2 is a schematic diagram of an edge gateway failover according to an embodiment of the present invention;

fig. 3 is a schematic diagram of an edge gateway failover finite state according to an embodiment of the present invention;

fig. 4 is a schematic diagram of an edge gateway newly added to a cluster according to an embodiment of the present invention;

fig. 5 is a diagram of a finite state machine for adding an edge gateway to a cluster according to an embodiment of the present invention;

fig. 6 is a schematic diagram of an edge gateway failure alarm provided in an embodiment of the present invention;

fig. 7 is a schematic diagram of an edge gateway fault alarm finite state machine according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The terms "comprises" and "comprising," and any variations thereof, in the present description and claims and drawings are intended to cover a non-exclusive inclusion, such as a list of steps or elements.

The technical solution of the present invention is further described in detail with reference to the accompanying drawings and embodiments.

As shown in fig. 1, the Raft cluster contains multiple gateways, allowing the system to tolerate one failure. At any given time, each gateway is in one of three states, primary, election or candidate. These several states can be switched over to each other.

Raft uses a heartbeat mechanism to trigger the primary gateway election. When the gateways start up, they start with the identity of the voter. The gateway will maintain the election status as long as it receives a valid request from the primary gateway or candidate. The primary gateway sends a periodic heartbeat to all voters to maintain their privileges. If an election does not receive any communication within a period of time called an election timeout, the election considers that there is no viable primary gateway and begins to elect a new primary gateway.

As shown in fig. 2, the edge gateway performs failover, and the edge gateway is responsible for acquiring data in the new energy device and pushing the data to the data queue. And judging whether the gateways in the cluster normally provide acquisition service or not between the edge gateways through the heartbeat monitoring capability. When a certain running data acquisition gateway hardware in the cluster breaks down, the problem of link communication disconnection with new energy equipment occurs, and data cannot be acquired. At the moment, other gateways in the cluster can work in place of the failed gateway immediately, so that normal provision of data acquisition services is ensured.

As shown in fig. 3, an edge gateway failover finite state machine diagram, where devices in an edge gateway cluster have a heartbeat check mechanism, and gateways in the cluster detect the survival state of the gateway every 5 seconds, and when a problem occurs in a gateway device in the cluster, one gateway in the cluster takes over the data acquisition task of the failed gateway. And after the completion of the succession, the gateway equipment enters the state checking mode again to ensure the normal operation of the data acquisition task.

As shown in fig. 4 and 5, when a new edge gateway joins the cluster, a gateway registration operation is first performed by configuring the gateway. After initiating the registration request, the gateway will check three times every 10 seconds. If the state fails, returning registration failure information and waiting for re-registration; if the state is successful, the cluster allocates a data acquisition task to the newly registered gateway by taking the current acquisition load as the weight, and completes the allocation operation, namely, the data acquisition task of the new energy equipment is started.

As shown in fig. 6, edge gateway failure alarm, as shown in fig. 7, edge gateway failure alarm finite state machine diagram. And (3) alarming the fault of the edge gateway, when the edge gateway has a problem, checking by other gateways in the cluster, recording the problem and giving an alarm after the fault is found.

Has the advantages that: the technical architecture aims at controlling risks and protecting the value of a company; an edge gateway cluster high-availability switching scheme based on the Raft algorithm coding; fault and switched automated problem notification; and (4) computing power improvement mode of the cluster.

The above embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, it should be understood that the above embodiments are merely exemplary embodiments of the present invention and are not intended to limit the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. An industrial internet based IoT device high availability method, wherein the device high availability method comprises:

the candidate gateway sends a rising main vote to other gateways;

2. The industrial internet-based IoT device high availability method in claim 1, wherein the device high availability method further comprises:

3. The industrial internet-based IoT device high availability method in claim 1, wherein the device high availability method further comprises: and (3) alarming the fault of the edge gateway, when the edge gateway has a problem, checking by other gateways in the cluster, recording the problem and giving an alarm after the fault is found.