CN115098303A

CN115098303A - Node scheduling method and device, electronic equipment and storage medium

Info

Publication number: CN115098303A
Application number: CN202210668481.8A
Authority: CN
Inventors: 刘俊杰; 曾琳铖曦; 孙磊; 吴海英; 蒋宁
Original assignee: Mashang Consumer Finance Co Ltd
Current assignee: Mashang Consumer Finance Co Ltd
Priority date: 2022-06-14
Filing date: 2022-06-14
Publication date: 2022-09-23

Abstract

The present disclosure provides a node scheduling method, apparatus, electronic device and medium based on a target cluster, the method comprising: the method comprises the steps of determining a fault domain matrix of a target cluster based on distribution information of each node in the target cluster in a fault domain, determining a node switching scheme of the target cluster based on the fault domain matrix and a node scheduling strategy, converting a fault domain problem into a matrix problem suitable for computer processing, accelerating the speed of determining the node switching scheme, improving the efficiency of processing the fault domain problem, and switching the nodes in the target cluster according to the node switching scheme so that the switched cluster meets the high availability condition of the fault domain. The method and the device can realize high availability of the fault domain of the target cluster under the condition of small influence on the data processing continuity of the target cluster.

Description

Node scheduling method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a node scheduling method and apparatus, an electronic device, and a storage medium.

Background

High availability scheduling of a fault domain based on a Remote Dictionary service (Redis) cluster generally focuses on how to distribute the cluster to meet the high availability of the fault domain when the cluster creates nodes, and for an existing Redis cluster, if the existing Redis cluster does not meet the high availability of the fault domain, a method of migrating data of the existing Redis cluster to other Redis clusters meeting the high availability of the fault domain is often adopted. This has a large influence on the continuity of data processing. Therefore, how to switch nodes in an existing Redis cluster under the condition of ensuring cluster service continuity becomes one of hot spots for researching high availability of fault domains of the cluster at present.

Disclosure of Invention

The present disclosure provides a node scheduling method, a node scheduling apparatus, an electronic device, and a storage medium, which can perform node switching on a cluster under the condition of ensuring cluster service continuity.

In a first aspect, the present disclosure provides a node scheduling method, including:

determining a fault domain matrix of a target cluster based on distribution information of each node in the target cluster in a fault domain; the fault domain matrix comprises the corresponding relation between each node and a fault domain;

determining a node switching scheme of the target cluster based on the fault domain matrix and a node scheduling strategy;

and switching the nodes in the target cluster according to the node switching scheme so that the switched target cluster meets the high availability condition of the fault domain.

In a second aspect, the present disclosure provides a node scheduling apparatus, including:

the processing unit is used for determining a fault domain matrix of the target cluster based on the distribution information of each node in the target cluster in the fault domain; the fault domain matrix comprises the corresponding relation between each node and a fault domain;

the processing unit is further configured to determine a node switching scheme of the target cluster based on the fault domain matrix and a node scheduling policy;

and the scheduling unit is used for switching the nodes in the target cluster according to the node switching scheme so as to enable the switched target cluster to meet the high availability condition of the fault domain.

In a third aspect, the present disclosure provides an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores one or more computer programs executable by the at least one processor, the one or more computer programs being executable by the at least one processor to enable the at least one processor to perform the above-described node scheduling method.

In a fourth aspect, the present disclosure provides a computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor/processing core, implements the above-described node scheduling method.

The node scheduling method provided by the disclosure determines a fault domain matrix of a target cluster based on distribution information of each node in the target cluster in a fault domain, determines a node switching scheme of the target cluster based on the fault domain matrix and a node scheduling strategy, can convert a fault domain problem into a matrix problem suitable for computer processing, accelerates the speed of determining the node switching scheme, and improves the efficiency of processing the fault domain problem. And then, the nodes in the target cluster are switched according to the node switching scheme, so that the switched cluster meets the fault domain high availability condition, the fault domain high availability of the target cluster can be realized without the target cluster being changed too much, the influence on the continuity of data processing of the cluster is small, and the continuity of the cluster-based service can be effectively improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The accompanying drawings are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. The above and other features and advantages will become more apparent to those skilled in the art by describing in detail exemplary embodiments thereof with reference to the attached drawings, in which:

fig. 1 is a flowchart of a node scheduling method according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram illustrating a fault domain matrix of a target cluster determined based on distribution information of nodes in the target cluster in a fault domain according to an embodiment of the present disclosure;

fig. 3 is a flowchart of a method for determining a node switching scheme of a target cluster based on a fault domain matrix and a node scheduling policy according to an embodiment of the present disclosure;

fig. 4 is a schematic diagram illustrating switching of nodes in a target cluster according to a node switching scheme according to an embodiment of the present disclosure;

fig. 5 is a flowchart of a node scheduling method based on a target cluster according to an embodiment of the present disclosure;

fig. 6 is a flowchart of a node scheduling method based on a target cluster according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a node scheduling apparatus based on a target cluster according to an embodiment of the present disclosure;

fig. 8 is a block diagram of an electronic device provided in an embodiment of the present disclosure.

Detailed Description

To enable those skilled in the art to better understand the technical aspects of the present disclosure, exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the present disclosure are included to assist understanding, and they should be considered as being merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Embodiments of the disclosure and features of the embodiments may be combined with each other without conflict.

As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present disclosure, and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

The embodiment of the disclosure provides a node scheduling method, which is used for performing node scheduling on a target cluster, wherein the target cluster refers to a Redis cluster which needs to perform node scheduling currently. Where Redis refers to a key-value pair (key-value) storage system. The method supports relatively many stored data structure types (value types), including character strings (string), linked lists (list), sets (set), ordered sets (sorted set or zset), hash types (hash) and the like. These data types all support rich operations of push and pop (push/pop), add and delete (add/remove), intersect, merge and difference and the like, and these operations are atomic.

A Redis cluster is a facility that can share data among multiple Redis nodes, and provides a certain degree of availability through partitioning, so that the cluster can continue to process command requests even if some of the nodes in the cluster fail or cannot communicate.

Redis clusters use data fragmentation rather than consistent hashing to implement: a Redis cluster contains 16384 hash slots, each key in the database belongs to one of the 16384 hash slots, the cluster uses the formula CRC16 (key)% 16384 to calculate which slot the key belongs to, wherein CRC16 is an error check code most commonly used in the field of data communication, and the length of the information field and the check field can be arbitrarily selected.

Each node in the Redis cluster is responsible for processing a portion of the hash slot. Generally, the nodes in the Redis cluster are divided into a plurality of master nodes and at least one slave node corresponding to each master node. The master node is configured to process a command request (for example, process a hash slot), and the slave node is configured to copy the corresponding master node and continue to process the command request instead of the offline master node when the corresponding master node goes offline. For example, a cluster has 3 master nodes (node a, node B, and node C), 3 slave nodes, node D, node E, and node F. The node A is responsible for processing hash grooves from No. 0 to No. 5460, and the slave node is D; the node B is responsible for processing hash grooves from 5461 to 10922, and the slave nodes of the hash grooves are E; node C is responsible for processing hash slots 10923 through 16383, whose slaves are F.

For any Redis cluster, if the master-slave node holding the slot goes down at the same time (e.g., master node A and its corresponding slave node D in the cluster described above), the entire cluster will be unavailable. Therefore, the master node under the cluster can be distributed in different fault domains, and the master node and at least one slave node corresponding to the master node are distributed in different fault domains so as to meet the requirement of high availability of the fault domains.

Among them, fault domain and high availability are closely related concepts. A fault domain is a group of hardware components that share a single point of failure. To achieve a certain level of fault tolerance (i.e. high availability), a number of fault domains of the corresponding level are required. For a Redis cluster, it is necessary to satisfy that the master nodes are not distributed on the same fault domain, and the master node and at least one slave node corresponding to the master node are not distributed on one fault domain, that is, at least the fault domains with the same number of master nodes in the cluster are required.

High availability scheduling based on the fault domain of the Redis cluster generally focuses on how nodes are distributed during cluster creation to meet the high availability of the fault domain, and for an existing Redis cluster, there is no reference on how to detect whether the fault domain high availability is met. Moreover, when some Redis clusters do not meet the high availability of the fault domain, a method of migrating the data of the Redis clusters to other Redis clusters meeting the high availability of the fault domain is often adopted, which has a great influence on the continuity of data processing.

The node scheduling method provided in this embodiment can quickly determine, for an existing Redis cluster, a node switching scheme for the Redis cluster, and meet the requirement of high availability of a fault domain without migrating the Redis cluster. The execution subject of the method can be electronic equipment such as terminal equipment or a server. The terminal device may include any device such as a mobile phone, a notebook computer, an intelligent voice interaction device, a vehicle-mounted terminal, and the server may be an independent physical server, a server cluster, or a cloud server capable of performing cloud computing.

Fig. 1 is a flowchart of a node scheduling method according to an embodiment of the present disclosure. As shown in FIG. 1, the method includes steps S11-S13.

And step S11, determining a fault domain matrix of the target cluster based on the distribution information of each node in the target cluster in the fault domain.

The target cluster refers to a Redis cluster which needs to be subjected to node scheduling currently. Each node in the target cluster refers to a node configured with the capability of processing a command request (e.g., processing a hash slot), and includes a plurality of master nodes (masters) and at least one slave node (slave) corresponding to each master node. The master node is configured to process a command request (for example, process a hash slot), and the slave node is configured to copy the corresponding master node and continue to process the command request instead of the offline master node when the corresponding master node goes offline. Each node in the target cluster is distributed in a plurality of fault domains, and the distribution information includes the distribution condition of each node in the plurality of fault domains, for example, the master node 1 is distributed in the fault domain 1, and the slave nodes of the master node 1 are distributed in the fault domain 2 and the fault domain 3, or the master node 2 is distributed in the fault domain 2, and the slave nodes corresponding to the master node 3 and the master node 3 are all distributed in the fault domain 3. The fault domain matrix can reflect the corresponding relation between each node and the fault domain.

In an embodiment, in the fault domain matrix determined based on the distribution information of each node in the target cluster in the fault domain, the fault domain matrix includes at least one matrix row, and each matrix row corresponds to one master node and a slave node corresponding to the master node. The elements of the preset bits in any matrix row of the fault domain matrix comprise a node identifier of a master node and a corresponding fault domain identifier, and each other element except the elements of the preset bits in the matrix row comprises a node identifier of a slave node corresponding to the master node and a fault domain identifier corresponding to the slave node.

The preset bit is any one bit in a predetermined matrix row, for example, the preset bit may be any one of a first bit, a second bit, and a third bit in the matrix row. Since each matrix row corresponds to only one master node, and there may be a plurality of slave nodes corresponding to the master node, for convenience of data processing, it is preferable to configure the preset bit as the first bit in the matrix row, that is, to place the node identifier of the master node in the first bit in the matrix row, and to place the node identifier of the slave node corresponding to the master node in other bits except the first bit. For example, referring to fig. 2, a schematic diagram for determining a fault domain matrix of a target cluster based on distribution information of nodes in the target cluster in a fault domain is provided for the embodiment of the present disclosure.

As shown in fig. 2, the target cluster 21 includes a plurality of nodes including: a master node a, a slave node a1 corresponding to the master node a; a master node B, a slave node B1 corresponding to the master node B; the master node C and the slave node C1 corresponding to the master node C. The plurality of nodes are distributed in three fault domains, and the fault domain identifications of the three fault domains are respectively 0, 1 and 2. The distribution information of each node in the target cluster in the fault domain is as follows: the master node a and the master node B are distributed in the fault domain 0, the master node C and the slave node a1 are distributed in the fault domain 1, and the slave node B1 and the slave node C1 are distributed in the fault domain 2.

As shown in fig. 2, the preset bit is the first bit of the matrix row in this embodiment. The fault domain matrix 22 determined based on the distribution information includes the correspondence of the nodes and the fault domains. In the fault domain matrix 22, the first bit of the first matrix row is 0(a), and the second bit is 1(a1), where 0(a) is used to identify that the master node a is in fault domain 0, and 1(a1) is used to identify that the slave node a1 corresponding to the master node a is in fault domain 1; the first bit of the second matrix row is 0(B), the second bit is 2(B1), where 0(B) is used to identify that the master node B is in fault domain 0, and 2(B1) is used to identify that the slave node B1 corresponding to the master node B is in fault domain 2; the first bit of the third matrix row is 1(C), the second bit is 2(C1), wherein 1(C) is used for the master node C in fault domain 1, 2(C1) is used for identifying the slave node C1 corresponding to the master node C in fault domain 2.

In the embodiment of the disclosure, the fault domain matrix of the target cluster is determined based on the distribution information of each node in the target cluster in the fault domain, the fault domain problem can be converted into the matrix problem suitable for computer processing, the speed of determining the node switching scheme can be increased, and the efficiency of processing the fault domain problem is improved.

And step S12, determining a node switching scheme of the target cluster based on the fault domain matrix and the node scheduling strategy.

The node scheduling policy is a criterion indicating how to determine a node switching scheme through the fault domain matrix. The node scheduling strategy is used for stipulating that a plurality of main nodes are not in the same fault domain, and any main node and at least one slave node corresponding to any main node are not in the same fault domain. The node switching scheme is a scheme indicating how to switch nodes in the target cluster to make the target cluster meet the fault domain high availability condition.

Based on the limitation on the node scheduling strategy and the node switching scheme determined based on the fault domain matrix and the node scheduling strategy, the target cluster can be ensured to meet the high availability of the fault domain after the nodes in the target cluster are switched according to the node switching scheme. Therefore, the accuracy and continuity of service execution in the target cluster can be ensured.

And step S13, switching the nodes in the target cluster according to the node switching scheme so that the scheduled target cluster meets the high availability condition of the fault domain.

The fault domain high availability condition means that a plurality of master nodes are not distributed on the same fault domain, and the master node and at least one slave node corresponding to the master node are not distributed on one fault domain.

In this embodiment, the target cluster does not need to be migrated, and the scheduled cluster can meet the fault domain high availability condition only by switching the nodes in the target cluster, that is, the fault domain high availability of the target cluster can be achieved in the present embodiment under the condition that the influence on the data processing continuity of the target cluster is small.

The embodiment of the disclosure provides a node scheduling method, which includes the steps of firstly, determining a fault domain matrix of a target cluster based on distribution information of each node in the target cluster in a fault domain, then, determining a node switching scheme of the target cluster based on the fault domain matrix and a node scheduling strategy, converting the fault domain problem into a matrix problem suitable for computer processing, accelerating the speed of determining the node switching scheme, and improving the efficiency of processing the fault domain problem, and finally, switching the nodes in the target cluster according to the node switching scheme, so that the scheduled cluster meets the high availability condition of the fault domain, the high availability of the fault domain of the target cluster can be realized under the condition of small influence on the data processing continuity of the target cluster, and further, the continuity of services based on the target cluster can be effectively improved.

Fig. 3 is a flowchart of a method for determining a node switching scheme of a target cluster based on a fault domain matrix and a node scheduling policy according to an embodiment of the present disclosure. In one embodiment, as illustrated in fig. 3, the method includes: step S31-step S34.

And step S31, acquiring at least one initial path corresponding to the fault domain matrix.

Wherein the initial path comprises one bit element of each matrix row in the fault domain matrix. As an optional implementation manner, the obtaining at least one initial path corresponding to the fault domain matrix may include: combining each element in the first matrix row with one element in the other matrix rows in sequence until no element which is not combined exists in each other matrix row; and taking each combination obtained by the traversal process as an initial path. For example, taking the fault domain matrix in fig. 2 as an example, the first element in the first matrix row is 0(a), and 0(a) is sequentially combined with 0(B) in the second matrix row, and 1(C) in the third matrix row, then 0(a) -0(B) -1(C) is an initial path; since elements not combined with 0(a) are also present in the second matrix row and the third matrix row, these elements are combined with 0 (a). According to the above process, at least one initial path that can be determined based on the fault domain matrix described in fig. 2 is as follows:

0(A)-0(B)-1(C)；

0(A)-0(B)-2(C1)；

0(A)-2(B1)-1(C)；

0(A)-2(B1)-2(C1)；

1(A1)-0(B)-1(C)；

1(A1)-0(B)-2(C1)；

1(A1)-2(B1)-1(C)；

1(A1)-2(B1)-2(C1)。

step S32, determining at least one candidate path from the at least one initial path, where the fault domain identifiers included in each element in each candidate path are different.

And traversing each initial path in the at least one initial path, and determining the initial path meeting the condition in the at least one initial path as a candidate path, wherein the condition meeting means that the fault domain identifications included by each element included in the initial path are different. Taking the initial path determined by the fault domain matrix shown in fig. 2 as an example, the candidate paths determined in this step are: 0(A) -2(B1) -1(C) and 1(A1) -0(B) -2 (C1).

In some embodiments, in order to increase the processing rate, the candidate path may be determined directly based on the fault domain matrix without determining the candidate path from the initial path after determining the initial path from the fault domain matrix. And after the elements corresponding to one step in the path are selected, the fault domain identifications of the fault domain identifications included by the elements corresponding to the next step are different from the fault domain identifications corresponding to all the previous steps until all the candidate paths are selected. For example, in the case that the element of the first step of the selected path is 0(a), the element containing the fault domain identifier 0 is not selected in the subsequent steps, i.e., the second step selects 2(B1), and the third step selects 1 (C).

In some embodiments, in a case where paths with different fault domain identifications included in each element cannot be selected from the initial path, it is indicated that the target cluster does not have a suitable node for switching, and therefore, in this case, the scheduling process for the target cluster may be ended.

Step S33, the candidate path with the largest number of elements including the preset bits in at least one candidate path is taken as the target path.

Taking the fault domain matrix shown in fig. 2 as an example, the preset bit of the fault domain matrix is the first bit, and the candidate paths with the largest number of elements including the preset bit of the matrix row are: 0(a) -2(B1) -1(C), and thus, the candidate path 0(a) -2(B1) -1(C) is determined as the target path.

It should be noted that, because the high availability condition of the fault domain requires that the master nodes are not distributed on the same fault domain, and the element of the preset bit in the fault domain matrix determined in the foregoing step corresponds to the master node, in this step, the candidate path including the largest number of elements of the preset bit of the matrix row in the candidate path is taken as the target path, so as to clarify the master nodes already existing in each fault domain, so that in the subsequent step of determining at least one node to be scheduled through the target path, the scheduling of the master nodes is reduced as much as possible, thereby reducing the influence on the target cluster.

And step S34, determining a node to be scheduled based on the target path and the node scheduling strategy, and generating a node switching scheme according to the node to be scheduled.

The node to be scheduled refers to the determined node needing to be scheduled. The node switching scheme includes switching of nodes to be scheduled.

The node scheduling policy is used to make a plurality of master nodes not in the same fault domain, and any master node and at least one slave node corresponding to any master node not in the same fault domain, so in this embodiment, at least one node to be scheduled is determined according to a target path and the node scheduling policy, and a node switching scheme is generated according to at least one node to be scheduled, so that the node switching scheme is a scheme that can effectively make a scheduled target cluster meet a high availability condition of the fault domain.

Taking target paths 0(a) -2(B1) -1(C) determined by the fault domain matrix shown in fig. 2 as an example, the target paths indicate that the master node a already exists in the fault domain 0 and the master node C already exists in the fault domain 1, but no master node exists in the fault domain 2, and therefore, the slave node B1 in the fault domain 2 is determined as a node to be scheduled. The node switching scheme generated based on the node B1 to be scheduled includes the switching of the node B1 to be scheduled.

In one embodiment, the node switching scheme includes switching of nodes to be scheduled. Then, the switching the node in the target cluster according to the node switching scheme (step S13) includes: and switching the node to be scheduled and the main node corresponding to the node to be scheduled.

In some embodiments, the node switching scheme may exist in a matrix form, for example, node identifications of the node to be scheduled and the master node corresponding to the node to be scheduled in the fault domain matrix of the foregoing portion are interchanged. The node switching scheme may also exist in other forms, such as text information, code information, and the like.

Fig. 4 is a schematic diagram illustrating switching of nodes in a target cluster according to a node switching scheme according to an embodiment of the present disclosure. As shown in fig. 4, the node switching scheme 41 exists in the form of a matrix. Based on the node switching scheme 41, the nodes to be scheduled and the master nodes corresponding to the nodes to be scheduled are switched, in the scheduled target cluster 42, a plurality of master nodes (A, B, C) are not in the same fault domain, and at least one slave node (a and a1, B and B1, C and C1) corresponding to any master node and the master node are not in the same fault domain, so that the scheduled target cluster 42 meets the high availability condition of the fault domain.

Fig. 5 is a flowchart of a node scheduling method based on a target cluster according to an embodiment of the present disclosure. In one embodiment, before determining the fault domain matrix of the target cluster based on the distribution information of the nodes in the target cluster in the fault domain (step S11 described above), as shown in fig. 5, the method includes step S10 a.

And step S10a, determining whether the target cluster meets the scheduling condition based on the attribute parameter of the target cluster.

The attribute parameters of the target cluster are parameters for describing the configuration condition of the target cluster, and for example, the attribute parameters include one or more of the number of fault domains, the number of master nodes, the number of slave nodes corresponding to the master nodes, and distribution information of each node in a fault domain. The scheduling condition refers to a basic condition which can be satisfied by the target cluster through the node switching to realize the high availability of the fault domain.

In some embodiments, the attribute parameters of the target cluster include the number of failure domains and the number of master nodes. The determining whether the target cluster meets the scheduling condition based on the attribute parameters of the target cluster includes: determining that the target cluster meets the scheduling condition under the condition that the number of the fault domains is greater than or equal to that of the master nodes; and under the condition that the number of the fault domains is less than that of the main nodes, determining that the target cluster does not accord with the scheduling condition.

It should be noted that, when the number of fault domains is smaller than the number of master nodes, no matter how to switch the nodes of the target cluster, all the master nodes cannot be in different fault domains, that is, the target cluster after scheduling cannot meet the requirement of high availability of the fault domains. Therefore, in this embodiment, the scheduling conditions are set according to the number of the fault domains and the number of the master nodes, so that the target clusters of the type can be screened out, the target clusters meeting the scheduling conditions are processed, resource waste is avoided, and processing efficiency is improved.

In some embodiments, the attribute parameter of the target cluster further includes distribution information of each node in the target cluster in the fault domain. The determining whether the target cluster meets the scheduling condition based on the attribute parameters of the target cluster includes: for any master node, the master node and at least one corresponding slave node determine that a target cluster meets a scheduling condition under the condition of different fault domains; and under the condition that all the slave nodes corresponding to one master node are in one fault domain, determining that the target cluster does not meet the scheduling condition.

It should be noted that, if one master node and all slave nodes corresponding to the master node in the target cluster are located in the same fault domain, no matter how the nodes of the target cluster are switched, the master node and at least one corresponding slave node cannot be in different fault domains, that is, the scheduled target cluster cannot meet the requirement of high availability of the fault domain. Therefore, in this embodiment, the scheduling condition is set according to the distribution information of each node in the target cluster in the fault domain, so that the target cluster of the type can be screened out, and the target cluster meeting the scheduling condition is processed, thereby avoiding resource waste and improving processing efficiency.

In one embodiment, in the case that the target cluster does not meet the scheduling condition, the scheduling process for the target cluster is ended.

As shown in fig. 5, the determining the fault domain matrix of the target cluster based on the distribution information of the nodes in the target cluster in the fault domain (step S11) includes: step S11 a.

And step S11a, determining the fault domain matrix of the target cluster based on the distribution information of each node in the target cluster in the fault domain under the condition that the target cluster meets the scheduling condition.

For a process of determining a fault domain matrix of a target cluster based on distribution information of each node in the target cluster in a fault domain, please refer to the description of the foregoing embodiment, which is not described herein again.

In the embodiment, the target clusters which do not meet the scheduling conditions can be screened out by setting the scheduling conditions, and the target clusters which meet the scheduling conditions are processed, so that the resource waste can be effectively avoided, and the processing efficiency is improved.

Fig. 6 is a flowchart of a node scheduling method based on a target cluster according to an embodiment of the present disclosure.

In one embodiment, before determining whether the target cluster meets the scheduling condition based on the attribute parameter of the target cluster (step S10a), the method further includes: step S00.

And step S00, determining whether the target cluster meets the fault domain high availability condition.

The fault domain high availability condition refers to that a plurality of main nodes in a target cluster are not in the same fault domain, and any main node and at least one slave node corresponding to the main node are not in the same fault domain.

In one embodiment, determining whether the target cluster satisfies a fault domain high availability condition includes the following cases one and two.

In the first situation, based on the distribution information of each node in the target cluster in the fault domain, it is determined that any one master node and at least one other master node in the target cluster are distributed in the same fault domain, or, under the condition that any one master node and all corresponding slave nodes are distributed in the same fault domain, it is determined that the target cluster does not meet the high availability condition of the fault domain.

In this embodiment, as shown in fig. 6, the determining whether the target cluster meets the scheduling condition based on the attribute parameter of the target cluster (step S10a) includes: step S10 b.

And step S10b, determining whether the target cluster meets the scheduling condition based on the attribute parameters of the target cluster under the condition that the target cluster does not meet the fault domain high availability condition.

For a detailed process of determining whether the target cluster meets the scheduling condition based on the attribute parameter of the target cluster, reference is made to the description of the foregoing embodiment, and details are not repeated here.

And secondly, determining that each main node and other main nodes in the target cluster are distributed in different fault domains based on the node distribution information of the target cluster, and determining that the target cluster meets the high availability condition of the fault domains under the condition that any main node and at least one corresponding slave node are distributed in different fault domains.

In this embodiment, when the target cluster meets the high availability condition of the fault domain, it is determined that the target cluster does not need to be scheduled, and the scheduling process for the target cluster is ended.

It should be noted that, because the present disclosure is directed to a target cluster that does not satisfy a high availability condition of a fault domain, in this embodiment, determining whether the target cluster satisfies the high availability condition of the fault domain can screen out the target cluster that satisfies the high availability condition of the fault domain, and process the target cluster that does not satisfy the high availability condition of the fault domain, which can avoid resource waste and improve processing efficiency.

It is understood that the above-mentioned embodiments of the method of the present disclosure can be combined with each other to form a combined embodiment without departing from the principle logic, which is limited by the space, and the detailed description of the present disclosure is omitted. Those skilled in the art will appreciate that in the above methods of the specific embodiments, the specific order of execution of the steps should be determined by their function and possibly their inherent logic.

In addition, the embodiment of the present disclosure also provides a node scheduling apparatus based on a target cluster, an electronic device, and a computer-readable storage medium, which are all used to implement any one of the node scheduling methods based on a target cluster provided by the present disclosure, and the corresponding technical solutions and descriptions and corresponding descriptions in the method section are referred to and are not described again.

Fig. 7 is a schematic structural diagram of a node scheduling apparatus based on a target cluster according to an embodiment of the present disclosure. As shown in fig. 7, the node scheduling apparatus includes: a processing unit 71 and a scheduling unit 72.

The processing unit 71 is configured to determine a fault domain matrix of the target cluster based on distribution information of each node in the target cluster in a fault domain; the fault domain matrix comprises the corresponding relation between each node and the fault domain.

The processing unit 71 is further configured to determine a node switching scheme of the target cluster based on the fault domain matrix and the node scheduling policy.

The scheduling unit 72 is configured to switch nodes in the target cluster according to the node switching scheme, so that the switched target cluster meets a high availability condition of a fault domain.

In one embodiment, each node comprises a plurality of master nodes and at least one slave node corresponding to each master node; the node scheduling strategy is used for stipulating that a plurality of main nodes are not in the same fault domain, and any main node and at least one slave node corresponding to any main node are not in the same fault domain.

In one embodiment, the fault domain matrix comprises at least one matrix row, each matrix row corresponding to one master node and at least one slave node corresponding to the master node; the element of the preset bit in any matrix row of the fault domain matrix comprises the node identifier of any master node and the corresponding fault domain identifier, and each other element of any matrix row except the element of the preset bit comprises the node identifier of a slave node corresponding to any master node and the corresponding fault domain identifier.

In one embodiment, when determining the node switching scheme of the target cluster based on the fault domain matrix and the node scheduling policy, the processing unit 71 performs the following steps:

obtaining at least one initial path corresponding to the fault domain matrix, wherein the initial path comprises one bit element of each matrix row in the fault domain matrix;

determining at least one candidate path from at least one initial path, wherein fault domain identifications included by each element in each candidate path are different;

taking the candidate path with the largest number of elements containing preset bits in at least one candidate path as a target path;

and determining a node to be scheduled based on the target path and the node scheduling strategy, and generating a node switching scheme according to the node to be scheduled.

In one embodiment, the node switching scheme includes switching of the node to be scheduled; when the node in the target cluster is switched according to the node switching scheme, the scheduling unit 72 performs the following steps: and switching the node to be scheduled and the main node corresponding to the node to be scheduled.

In one embodiment, the processing unit 71 is further configured to determine whether the target cluster meets the scheduling condition based on the attribute parameter of the target cluster;

and under the condition that the target cluster meets the scheduling condition, determining a fault domain matrix of the target cluster based on the distribution information of each node in the fault domain in the target cluster.

In one embodiment, the attribute parameters include the number of fault domains and the number of master nodes. When determining whether the target cluster meets the scheduling condition based on the attribute parameter of the target cluster, the processing unit 71 performs the following steps: and determining that the target cluster meets the scheduling condition when the number of the fault domains is greater than or equal to the number of the main nodes.

In one embodiment, the processing unit 71 is further configured to determine whether the target cluster meets a fault domain high availability condition; when determining whether the target cluster meets the scheduling condition based on the attribute parameter of the target cluster, the processing unit 71 performs the following steps:

and determining whether the target cluster meets a scheduling condition based on the attribute parameters of the target cluster under the condition that the target cluster does not meet the fault domain high availability condition.

In one embodiment, the processing unit 71, when determining whether the target cluster meets the fault domain high availability condition, performs the following steps:

if any main node and at least one other main node in the target cluster are determined to be distributed in the same fault domain based on the distribution information of each node in the target cluster in the fault domain, or if any main node and a slave node corresponding to the any main node are determined to be distributed in the same fault domain, determining that the target cluster does not meet the high availability condition of the fault domain;

if any main node in the target cluster and at least one other main node are determined not to be in the same fault domain based on the distribution information of each node in the fault domain in the target cluster, and the slave nodes corresponding to the any main node and the any main node are distributed in different fault domains, determining that the target cluster meets the high availability condition of the fault domain.

The embodiment of the present disclosure provides a node scheduling apparatus based on a target cluster, where a processing unit 71 is configured to determine a fault domain matrix of the target cluster based on distribution information of each node in the target cluster in a fault domain, and determine a node switching scheme of the target cluster based on the fault domain matrix and a node scheduling policy, which can convert a fault domain problem into a matrix problem suitable for computer processing, accelerate the speed of determining the node switching scheme, and improve efficiency of processing the fault domain problem, and a scheduling unit 72 is configured to switch nodes in the target cluster according to the node switching scheme, so that the scheduled cluster meets a high availability condition of the fault domain, and can achieve high availability of the fault domain of the target cluster under the condition of small influence on data processing continuity of the target cluster, thereby effectively improving continuity of a service based on the target cluster.

Fig. 8 is a block diagram of an electronic device provided in an embodiment of the present disclosure. Referring to fig. 8, an embodiment of the present disclosure provides an electronic device including: at least one processor 801; at least one memory 802, and one or more I/O interfaces 803 coupled between the processor 801 and the memory 802; wherein the memory 802 stores one or more computer programs executable by the at least one processor 801, the one or more computer programs being executable by the at least one processor 801 to:

determining a fault domain matrix of the target cluster based on distribution information of each node in the target cluster in the fault domain; the fault domain matrix comprises the corresponding relation between each node and a fault domain; determining a node switching scheme of a target cluster based on the fault domain matrix and the node scheduling strategy; and switching the nodes in the target cluster according to the node switching scheme so that the switched target cluster meets the high availability condition of the fault domain.

In some embodiments, each node comprises a master node and at least one slave node corresponding to the master node; the node scheduling strategy is used for enabling a plurality of main nodes not to be in the same fault domain, and enabling any main node and at least one slave node corresponding to the main node not to be in the same fault domain.

In some embodiments, the fault domain matrix comprises at least one matrix row, each matrix row corresponding to one master node and its corresponding at least one slave node; the element of the preset bit in any matrix row of the fault domain matrix comprises the node identifier of any main node and the corresponding fault domain identifier, and each other element except the element of the preset bit in any matrix row comprises the node identifier of a slave node corresponding to any main node and the corresponding fault domain identifier.

In some embodiments, the processor 801, when determining the node switching scheme of the target cluster based on the fault domain matrix and the node scheduling policy, performs the following steps: acquiring at least one initial path corresponding to the fault domain matrix; the initial path comprises one bit of element of each matrix row in the fault domain matrix; determining a candidate path from at least one initial path, wherein fault domain identifications included by each element in the candidate path are different; taking the candidate path with the largest number of elements including preset bits of the matrix row in the candidate paths as a target path; and determining at least one node to be scheduled based on the target path and the node scheduling strategy, and generating a node scheduling scheme according to the at least one node to be scheduled.

In some embodiments, the node switching scheme comprises switching of at least one node to be scheduled. When the processor 801 switches the nodes in the target cluster according to the node switching scheme, the following steps are performed: and switching the node to be scheduled and the main node corresponding to the node to be scheduled.

In some embodiments, the processor 801 is further configured to determine whether the target cluster meets the scheduling condition based on the attribute parameters of the target cluster before determining the fault domain matrix of the target cluster based on the distribution information of the nodes in the target cluster in the fault domain.

When determining the fault domain matrix of the target cluster based on the distribution information of the nodes in the target cluster in the fault domain, the processor 801 executes the following steps: and under the condition that the target cluster meets the scheduling condition, determining a fault domain matrix of the target cluster based on the distribution information of each node in the fault domain in the target cluster.

In some embodiments, the attribute parameters include the number of fault domains and the number of master nodes. The processor 801, when determining whether the target cluster meets the scheduling condition based on the attribute parameter of the target cluster, performs the following steps: and under the condition that the number of the fault domains is greater than or equal to the number of the main nodes, determining that the target cluster meets the scheduling condition.

In some embodiments, the processor 801 is further configured to determine whether the target cluster meets the fault domain high availability condition before determining whether the target cluster meets the scheduling condition based on the attribute parameters of the target cluster.

The processor 801, when determining whether the target cluster meets the scheduling condition based on the attribute parameter of the target cluster, performs the following steps: and under the condition that the target cluster does not meet the high availability condition of the fault domain, determining whether the target cluster meets the scheduling condition or not based on the attribute parameters of the target cluster.

In some embodiments, the processor 801, in determining whether the target cluster meets the fault domain high availability condition, performs the following steps: if any one main node and at least one other main node in the target cluster are distributed in the same fault domain based on the distribution information of all nodes in the target cluster in the fault domain, or under the condition that any one main node and all corresponding slave nodes are distributed in the same fault domain, the target cluster is determined not to meet the high availability condition of the fault domain; if any main node in the target cluster and at least one other main node are determined not to be in the same fault domain based on the distribution information of each node in the fault domain in the target cluster, and the slave nodes corresponding to the any main node and the any main node are distributed in different fault domains, determining that the target cluster meets the high availability condition of the fault domain.

In the embodiment of the disclosure, a fault domain matrix of a target cluster is determined based on distribution information of each node in the target cluster in a fault domain, a node switching scheme of the target cluster is determined based on the fault domain matrix and a node scheduling strategy, the fault domain problem can be converted into a matrix problem suitable for computer processing, the speed of determining the node switching scheme is increased, the efficiency of processing the fault domain problem is improved, and then the nodes in the target cluster are switched according to the node switching scheme, so that the switched cluster meets a high availability condition of the fault domain, the high availability of the fault domain of the target cluster can be realized without great change of the target cluster, the influence on the continuity of data processing of the cluster is small, and the continuity of the cluster-based service can be effectively improved.

The disclosed embodiments also provide a computer-readable storage medium having a computer program stored thereon, where the computer program, when executed by a processor/processing core, implements the above-mentioned node scheduling method. The computer readable storage medium may be a volatile or non-volatile computer readable storage medium.

The disclosed embodiments also provide a computer program product, which includes computer readable code or a non-volatile computer readable storage medium carrying computer readable code, and when the computer readable code is executed in a processor of an electronic device, the processor in the electronic device executes the above node scheduling method.

It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable storage media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).

The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable program instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, Random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), Static Random Access Memory (SRAM), flash memory or other memory technology, portable compact disc read-only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer. In addition, communication media typically embodies computer readable program instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as is well known to those skilled in the art.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry may execute computer-readable program instructions to implement aspects of the present disclosure by utilizing state information of the computer-readable program instructions to personalize a custom electronic circuit, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

The computer program product described herein may be embodied in hardware, software, or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a generic and descriptive sense only and not for purposes of limitation. In some instances, features, characteristics and/or elements described in connection with a particular embodiment may be used alone or in combination with features, characteristics and/or elements described in connection with other embodiments, unless expressly stated otherwise, as would be apparent to one skilled in the art. Accordingly, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the disclosure as set forth in the appended claims.

Claims

1. A node scheduling method, comprising:

2. The node scheduling method of claim 1, wherein each node comprises a plurality of master nodes and at least one slave node corresponding to each master node;

the node scheduling strategy is used for stipulating that a plurality of main nodes are not in the same fault domain, and any main node and at least one slave node corresponding to any main node are not in the same fault domain.

3. The node scheduling method of claim 2, wherein the fault domain matrix comprises at least one matrix row, and each matrix row corresponds to one master node and at least one slave node corresponding to the master node; the element of the preset bit in any matrix row of the fault domain matrix comprises the node identifier of any master node and the corresponding fault domain identifier, and each other element of any matrix row except the element of the preset bit comprises the node identifier of a slave node corresponding to any master node and the corresponding fault domain identifier.

4. The node scheduling method of claim 2, wherein the determining the node switching scheme of the target cluster based on the fault domain matrix and the node scheduling policy comprises:

acquiring at least one initial path corresponding to the fault domain matrix; the initial path comprises one bit of element of each matrix row in the fault domain matrix;

determining at least one candidate path from the at least one initial path, wherein fault domain identifications included by each element in each candidate path are different;

taking the candidate path with the largest number of elements containing preset bits in the at least one candidate path as a target path;

and determining a node to be scheduled based on the target path and the node scheduling strategy, and generating the node switching scheme according to the node to be scheduled.

5. The node scheduling method according to claim 4, wherein the node switching scheme comprises switching of the node to be scheduled;

the switching the nodes in the target cluster according to the node switching scheme includes:

and switching the node to be scheduled and the main node corresponding to the node to be scheduled.

6. The node scheduling method according to claim 2, wherein before determining the fault domain matrix of the target cluster based on the distribution information of the nodes in the target cluster in the fault domain, the method further comprises:

determining whether the target cluster meets a scheduling condition based on the attribute parameters of the target cluster;

the determining the fault domain matrix of the target cluster based on the distribution information of each node in the target cluster in the fault domain includes:

and under the condition that the target cluster meets the scheduling condition, determining a fault domain matrix of the target cluster based on the distribution information of each node in the target cluster in the fault domain.

7. The node scheduling method of claim 6, wherein the attribute parameters comprise a number of fault domains and a number of master nodes;

the determining whether the target cluster meets a scheduling condition based on the attribute parameter of the target cluster includes:

and determining that the target cluster meets the scheduling condition when the number of fault domains is greater than or equal to the number of the master nodes.

8. The node scheduling method of claim 6, wherein before the determining whether the target cluster meets the scheduling condition based on the attribute parameter of the target cluster, the method further comprises:

determining whether the target cluster meets a fault domain high availability condition;

9. The node scheduling method of claim 8, wherein the determining whether the target cluster meets a fault domain high availability condition comprises:

10. A node scheduling apparatus, comprising:

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores one or more computer programs executable by the at least one processor to enable the at least one processor to perform the node scheduling method of any of claims 1-9.

12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the node scheduling method according to any one of claims 1-9.