CN107783851B

CN107783851B - Markov modeling method for steady-state availability of server cluster

Info

Publication number: CN107783851B
Application number: CN201710867338.0A
Authority: CN
Inventors: 郭霖瀚; 冯晓; 孔丹丹; 杨懿
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2017-09-22
Filing date: 2017-09-22
Publication date: 2020-06-02
Anticipated expiration: 2037-09-22
Also published as: CN107783851A

Abstract

The invention provides a Markov modeling method for steady-state availability of a server cluster, which comprises the following steps: 1. determining relevant parameters of the server cluster; 2. selecting parameters describing the state of the component, and defining various states of the component; 3. determining the transfer rate among the states of the component, and constructing a Continuous Time Markov Chain (CTMC) of the component; 4. calculating a transfer rate matrix of the component; 5. calculating steady-state probabilities of the states of the component; 6. calculating an expected spare part shortage number for the component; 7. respectively executing the step 2 to the step 6, and constructing a CTMC family model of the server cluster; 8. calculating the steady-state availability of the server cluster; through the steps, a CTMC model of each part is established, and a CTMC family model of the server cluster is established; calculating the availability of the system; the modeling method can reduce the state quantity in the Markov process, reduce the difficulty of calculating the steady-state availability of the server cluster, and provide valuable reference information for the design and improvement of the server cluster.

Description

Markov modeling method for steady-state availability of server cluster

Technical Field

The invention provides a Markov modeling method for steady-state availability of a server cluster, and belongs to the technical field of maintenance support.

Background

With the continuous development of internet technology, users increasingly emphasize rapid and accurate service experience, and necessarily require that information providers can provide services continuously for 24 hours all year round. Statistical data reports show that: the economic loss of U.S. enterprises due to computer system failure is nearly $ 40 billion annually, and the resulting decline in reputation is even more immeasurable. In order to avoid unplanned service interruption and provide better service experience, service providers often build server clusters (multi-server hot standby) in important data processing centers to achieve high reliability and uninterrupted service capability.

The availability of a cluster of servers is mainly affected by software and hardware. Data shows that the hardware fault accounts for more than 20% of the total fault of the server cluster. The performance and maintenance of the hardware will have a very important impact on the availability of the server cluster. From the hardware perspective, the method for determining the availability of the server cluster at home and abroad, which is researched and researched by the invention, mainly comprises the following steps: reliability block diagrams, fault trees, Petri nets, Markov processes, etc. However, the above method often does not consider the influence of the spare parts on the availability of the cluster system when the server is repaired after the server fails. Spare part provisioning is a key in security activities, and affects the service conditions of a server cluster mainly by affecting the repair turnaround time of a failed part. It is necessary to take the spare part supply guarantees into account when calculating the server cluster steady state availability. Therefore, the accurate calculation of the steady-state availability has obvious theoretical significance and practical value on the design, operation and management of the server cluster.

In order to solve the problem, the invention considers the influence of spare parts on the number of available servers in the cluster system, determines the state space of the cluster system by using three typical parameters, combines Continuous Time Markov Chains (CTMC) of all key components, and comprehensively forms a CTMC family model representing the state transition condition of the server cluster system, thereby calculating the steady-state availability of the server cluster system.

Disclosure of Invention

(1) Objects of the invention

In order to provide an efficient and reliable level of information services, it is desirable to increase the reliability and availability of server clusters. The server providers usually increase the number of redundant servers on the one hand and equip corresponding spare parts on the other hand, whereby the server clusters tend to become more and more voluminous in size. How to more accurately calculate the availability of a server cluster is also a difficult problem. In view of the above problems, the present invention provides a markov modeling method for steady-state availability of a server cluster, which provides a method for calculating the steady-state availability of the server cluster by using a markov process. The method has wide universality and is suitable for calculating the steady-state availability of the server cluster continuously working for a long time.

(2) Technical scheme

The server cluster related by the invention is composed of a plurality of same servers to form a multi-machine hot standby mode. When the server cluster executes tasks, the servers work independently. Assuming that the key components of the server are in a series structure, the server stops running as long as one key component fails; the failure time and the maintenance time of each key component are independent and obey exponential distribution; neglecting the replacement time of the fault piece; the spare part inventory structure is a single base model; repair level repair capabilities are unlimited and repairs are successful, with adequate supply of other repair resources, except for spare parts.

When the distribution of time to failure and time to repair of the servers are both exponentially distributed, such a system can be described using a Markov process, as long as the states are properly defined. According to a classical inventory balance formula, the spare part shortage number can be calculated through the inventory quantity of the spare parts. In the invention, when any one key component in the server is in shortage of spare parts, the server is in a failure state. Thus, for a cluster of servers, the granularity using markov modeling can be refined to a key component hierarchy. Changes in the number of available servers are reflected indirectly by changes in the status of critical components, and the sum of expected shortages of each spare part is then used to calculate the server cluster steady state availability.

Based on the above conditions, the method for establishing the steady-state availability of the server cluster by using the Markov model provided by the invention comprises the following steps:

step 1, determining relevant parameters of a server cluster according to the architectural characteristics of the server cluster;

step 2, selecting parameters for describing the state of the component, and defining various states of the component;

step 3, determining the transfer rate among the states of the component, and constructing a Continuous Time Markov Chain (CTMC) of the component;

step 4, calculating a transfer rate matrix of the component;

step 5, calculating the steady-state probability of each state of the component by solving a specific non-homogeneous linear equation set;

step 6, calculating the expected spare part shortage number of the component;

step 7, respectively executing the step 2 to the step 6 on each type of component, combining CTMCs of each type of component, constructing a CTMC family model of the server cluster, and calculating the expected spare part shortage number of the server cluster;

and 8, calculating the steady-state availability of the server cluster.

The step 1 of determining the relevant parameters of the server cluster refers to determining the number N of servers, the number L of key components of the servers, the failure rate and maintenance rate of each component, and the number of single server installations included in the server cluster; a typical server cluster architecture is shown in fig. 1.

Wherein, the "part" described in step 2 to step 7 refers to a specific ith-type key part, and "i" can be taken from any number of 1,2, …, L, i.e. i e {1,2, …, L }.

Wherein, the "selecting a parameter describing a state of a component" described in step 2 means using three parameters: number of available servers O affected by component i_iInventory G of Components i_iNumber of shortage of spare parts BO_iThereby determining a state space of the component; at the initial moment, the inventory quantity S of the component i_i。

The "defining various states of the component" in step 2 means that the specific defining method is as follows for various states of the component i:

state 0: (N, S)_i0), representing the number of N available servers, S_iThe number of stocks, 0 spare part shortage number;

state 1: (N, S)_i-1,0), representing the number of N available servers, S_i-1 stock count, 0 spare part shortage count;

state S_i: (N,0,0) indicating the number of N available servers, the number of 0 stocks, and the number of 0 spare part shortages;

state S_i+1: (N-1,0,1) representing the number of N-1 available servers, 0 inventory number, 1 spare part shortage number;

state S_i+ k: (N-k,0, k) representing the number of N-k available servers, the number of 0 stocks, and the number of k spare part shortages;

state S_i+ N: (0,0, N), 0 number of available servers, 0 number of inventory, N number of spare part shortages;

so that component i has E_i＝S_i+ N +1 states, the numbers of the states being 0,1,2, …, S_i+N。

Wherein, the step 3 of determining the transition rate among the states of the component means that the Markov process transition rate of the component i is determined according to the relationship among the states of the component i; lambda [ alpha ]_iAnd mu_iRespectively representing the failure rate and the repair rate of the component i;

when spare parts of component i are sufficient: if one component i fails, the failed component is replaced by using the spare parts, the number of the spare parts is reduced by 1, the number of available servers is still N, and the component i is in the slave state (N, S)_i-q,0) to the adjacent state (N, S)_iQ-1,0) with a transfer rate NZ_iλ_i(ii) a When 1 component i is restored and then is returned to the spare part library, the number of spare parts of the component i is increased by 1, and the component i is in the slave state (N, S)_i-q-1,0) to the adjacent state (N, S)_iQ,0) with a transfer rate of (q +1) mu_i(ii) a Wherein q is more than or equal to 0 and less than or equal to S_i-1；

When a shortage of spare parts of component i occurs: if a component i fails, it will cause a server to suspend operation, i.e. the number of available servers O_iReduced by 1, spare part shortage BO_iIncrement by 1, component i will transition from state (N-k,0, k) to state (N-k-1,0, k +1) at a transition rate of (N-k) Z_iλ_i(ii) a When a failed component i is repaired, a server can be recovered to normal operation, namely the number of available servers O_iWill increase by 1, the spare part shortage number BO_iDecrease 1, component i will transition from state (N-k-1,0, k +1) to state (N-k,0, k), with a transition rate of (S)_i+k+1)μ_i(ii) a Wherein k is more than or equal to 1 and less than or equal to N-1;

the state (N,0,0) and the state (N-1,0,1) are two states of whether or not the connection spare i is in shortage; the analysis shows that when the component i is in the state (N,0,0), if a fault occurs, the number of available servers O_iWill reduce 1, spare part shortage BO_iThe transition rate from 0 to 1 and from the state (N,0,0) to the state (N-1,0,1) is NZ_iλ_i(ii) a When a component i is in the state (N-1,0,1), if a failed component i is repaired, the number of available servers O will be made_iIncrease 1, spare parts shortage BO_iChanges from 1 to 0, and the transition rate from the state (N-1,0,1) to the state (N,0,0) is (S)_i+1)μ_i；

The initial state of component i is (N, S)_i0), the transfer rate matrix is recorded as

A_iIs an_i×E_iOf the matrix of (a). When m ≠ n

Represents the rate of transition from state m to state n, when m is equal to n

Denotes the division of the m (or n) th line

The opposite number of the sum of the other elements; m and n are numbers of the states of the parts only, 0. ltoreq. m, n. ltoreq. S_i+ N; this forms a set of continuous-time Markov chains for component i

{X_i(t)＝(O_i(t),G_i(t),BO_i(t)); t is more than or equal to 0}, which describes the state transition process of the component i, the corresponding state transition diagram is shown as the attached figure 2, and the transition rate among the states of the component i is shown as the table 1;

TABLE 1 State transition Rate of component i

Wherein the "calculating component transfer rate matrix" described in step 4 is based on the continuous-time Markov chain { X } established in step 3_i(t)＝(O_i(t),G_i(t),BO_i(t)); t ≧ 0} to determine a transfer rate matrix for component i, as follows:

the "calculating the steady-state probabilities of the states of the components" described in step 5 is performed by the following method: the steady state probability matrix of each state of component i is recorded as pi_i，

The steady state probability pi can be solved by the following non-homogeneous linear equation set_i；

Here, A_iA transfer rate matrix representing component i; 0 represents 1 × E_iA zero matrix of dimensions; pi_i,jRepresents the steady state probability that the component i is in the state j, j is more than or equal to 0 and less than or equal to S_i+N；

After solving the non-homogeneous linear equation system, the state p (p is more than or equal to 1 and less than or equal to S) can be obtained_iThe steady state probability of +1) is:

state S_iThe steady state probability of + h (h is more than or equal to 2 and less than or equal to N) is as follows:

due to the fact that

The steady state probability that component i is in state 0 is calculated as:

wherein, the calculation method of "calculating the expected spare part shortage number of the component" described in step 6 is as follows: after solving the steady state probabilities for each state of component i, the expected spare part shortage number for component i can be calculated by the following formula:

here, EBO (S)_i) Refers to the expected spare part shortage number for component i in steady state; b_iMeans a spare part shortage number vector for component i; b_i,mIs b_iElement (b), spare part shortage number when finger i is in state m. b_iThe specific form of (A) is as follows:

wherein, the step 7 of executing each component in the steps 2 to 6 respectively, combining the CTMCs of each component to construct the CTMC family model of the server cluster means to build Continuous Time Markov Chain (CTMC) { X) of the components one by one_i(t)＝(O_i(t),G_i(t),BO_i(t)); t ≧ 0} i ═ 1,2, …, L, and a comprehensive server cluster CTMC family { X (t) ═ X { X } can be established based on the fact that various component failures and repairs are independent of each other₁(t),…,X_i(t),…,X_L(t) }, t ≧ 0} model.

Wherein, the calculation method of "calculating the expected spare part shortage number of the server cluster" in step 7 is as follows: EBO of a Server Cluster under steady State_qEquivalent to the sum of expected values of the number of unavailable servers due to shortage of various spare partsThus EBO_qThe calculation formula of (a) is as follows:

in step 8, "calculating the steady availability of the server cluster", the calculation method is as follows: total number of servers N and EBO of server cluster in steady state_qSubtracting to obtain an expected value of the number of available servers; while the steady state availability A of the server cluster_qEquivalent to the percentage of the expected value of the number of available servers to the total number of servers N; hence server cluster A_qThe calculation formula of (a) is as follows:

through the steps, under the condition of considering spare parts, firstly establishing CTMC models of various components, and then establishing CTMC family models of a server cluster by combining the CTMCs of various components according to the mutually independent relationship of the faults and maintenance among the components; and then calculating the expected spare part shortage number of the server cluster according to the CTMC family model, and calculating the steady-state availability of the server cluster by using the relation between the expected spare part shortage number of the server cluster and the steady-state availability. The modeling method can effectively reduce the number of states in the Markov process, thereby reducing the difficulty of calculating the steady-state availability of the server cluster and providing valuable reference information for the design and improvement of the server cluster.

(3) Advantages and effects

The method provides a Markov modeling method for the steady-state availability of the server cluster by considering the influence of spare parts, and has the advantages that:

① the invention can provide guidance for the server and the server cluster spare parts making plan by considering the influence of the stock and shortage of different spare parts on the availability of the server cluster.

② the invention constructs the components CTMC satisfying the stock balance relation based on the structural characteristics of the server cluster, and establishes the CTMC family model of the server cluster according to the fault logic relation between the components.

③ the invention provides a CTMC family-based server cluster steady-state availability modeling method, and provides a new technical approach for calculating the availability index of the server cluster by applying the CTMC method.

Drawings

FIG. 1 is a typical server cluster architecture.

Figure 2 is a markov state transition diagram for component i.

FIG. 3 is a flow chart of a modeling method of the present invention.

Figure 4 is a markov state transition diagram for component 1.

The symbols in the figures are as follows:

n refers to the number of servers contained in the cluster system;

l refers to the number of types of parts;

Z_ia single server installation number representing component i;

λ_irefers to the failure rate of component i;

μ_imeans maintenance rate of component i;

represents the transition rate from state m to state n;

π_imeans the steady state probability vectors of each state of component i;

π_i,jrefers to the steady state probability that component i is in state j;

b_imeans a spare part shortage number vector for component i;

b_i,mmeans the spare part shortage number when the component i is in the state m;

E_irefers to the number of states of component i;

A_ia state transition rate matrix representing component i;

EBO(S_i) Refers to the expected spare part shortage number for component i;

EBO_qrefers to the expected spare part shortage number of the server cluster;

A_qrefers to the server cluster steady state availability;

CTMC refers to a continuous time Markov chain (continuous time Markov chain);

X_i(t) refers to a continuous-time Markov chain of components i;

x (t) refers to the CTMC family of server clusters;

O_irefers to the number of available servers affected by component i;

G_irefers to the inventory of component i;

BO_ithe number of spare parts shortage of the component i is referred to;

S_irefers to the initial inventory of component i;

O_i(t) refers to the number of available servers in the continuous-time Markov chain for component i;

G_i(t) refers to the number of spare parts available in the continuous time Markov chain for component i;

BO_i(t) refers to the number of spare part shortages in the continuous-time markov chain of component i;

Detailed Description

The following provides a more detailed description of the embodiments of the present invention with reference to the examples. In general, a complex server can be converted into a system with a serial structure by means of equivalence, combination and the like. The server cluster referred to in the following example contains 10 homogeneous servers, forming a multi-server hot-standby mode. Each server is composed of 8 types of key components in series. And setting the fault time and the maintenance time of various key components to respectively obey the exponential distribution.

The invention discloses a Markov modeling method for steady-state availability of a server cluster, a specific implementation flow is shown in figure 3, and the actual implementation steps are as follows:

step 1, collecting relevant information of the server cluster. N-10 and L-8 may be determined according to the architecture characteristics of the server cluster, and the specific component information is shown in table 2 below.

TABLE 2 part-related parameters

Step 2 three parameters were selected for part 1: number of available servers O₁Initial inventory number S₁Number of spare parts shortage BO₁To indicate the state transition of the component 1. At initial state, O₁＝10、S₁＝1、BO₁At this time, the component 1 has 12 states, namely ① state 0 (10,1,0), ② state 1 (10,0,0), ③ state 2 (9,0,1), ④ state 3 (8,0,2), ⑤ state 4 (7,0,3), ⑥ state 5 (6,0,4), ⑦ state 6 (5,0,5), ⑧ state 7 (4,0,6), ⑨ state 8 (3,0,7), ⑩ state 9 (2,0,8),

State 10: (1,0,9),

State 11: (0,0,10).

Step 3 determines the transfer rate of component 1 and constructs a continuous-time markov chain for component 1. In the present example, the number of initial devices of the part 1 is 1, and the transition rate between the states of the part 1 is calculated as shown in the following table:

TABLE 2 State transition Rate of component 1

The transfer rate matrix is expressed as

Is a 12-dimensional square matrix. When m ≠ n

Represents the rate of transition from state m to state n, when m is equal to n

Denotes the division of the m (or n) th line

The sum of the other elements is equal to or more than 0 and equal to or less than 11. This forms a set of continuous-time Markov chains { X } for component 1₁(t)＝(O₁(t),G₁(t),BO₁(t)); t ≧ 0}, which describes the component 1 state transition process, the corresponding state transition diagram is shown in FIG. 4.

Step 4 calculates the transfer rate matrix of the component 1. On the basis of step 3, the state transition rate matrix of component 1 can be obtained by combining the data in table 1 as follows:

step 5 calculates the steady-state probabilities for each state of the component 1. According to the state transition rate matrix obtained in the step 4, calculating the following formula:

the steady-state probability of each state of the component 1 is calculated as:

π₁＝[0.99844565,0.00155314,0.00000121,0,0,0,0,0,0,0,0,0]

step 6 calculates the expected spare part shortage number for component 1. N is a₁The expected spare part shortage number is calculated by substituting the following equation:

wherein b is₁＝[0,0,1,2,3,4,5,6,7,8,9,10]^T，

The expected spare part shortage number for component 1 is calculated as: EBO (S)₁)＝0.00000121。

Step 7 repeating steps 2 to 6 for the remaining components 2 to 8, establishingThe CTMC family model for a server cluster is as follows: { X (t) ═ X₁(t),…,X_i(t),…,X₈(t) }, t ≧ 0 }. The expected spare part shortage number for the remaining components is then calculated as follows:

EBO(S₂)＝0.00000167

EBO(S₃)＝0.00000425

EBO(S₄)＝0.00000555

EBO(S₅)＝0.00000354

EBO(S₆)＝0.00001625

EBO(S₇)＝0.00003183

EBO(S₈)＝0.00000019

and 8, calculating the steady-state availability of the server cluster. Obtaining the expected spare part shortage number EBO of the server cluster by summing the expected spare part shortage numbers of the various components_q：

The steady state availability A of the server cluster is determined according to the functional relationship between the expected spare part shortage number of the server cluster and the steady state availability_qCan be calculated from the following formula:

Claims

1. a Markov modeling method for steady-state availability of a server cluster is characterized by comprising the following steps: the method comprises the following steps:

step 4, calculating a transfer rate matrix of the component;

step 6, calculating the expected spare part shortage number of the component;

step 8, calculating the steady-state availability of the server cluster;

through the steps, under the condition of considering spare parts, firstly establishing CTMC models of various components, and then establishing CTMC family models of a server cluster by combining the CTMCs of various components according to the mutually independent relationship of the faults and maintenance among the components; and then calculating the expected spare part shortage number of the server cluster according to the CTMC family model, and calculating the steady-state availability of the server cluster by using the relation between the expected spare part shortage number of the server cluster and the steady-state availability.

2. The method of claim 1, wherein the Markov model for steady-state availability of the server cluster is as follows: the determining of the relevant parameters of the server cluster in step 1 refers to determining the number N of servers, the number L of key components of the servers, the failure rate and the repair rate of each component, and the number Z of single server installations included in the server cluster_i。

3. The method of claim 2, wherein the Markov model for steady-state availability of the server cluster is as follows: the components in the steps 2 to 7 refer to specific ith-type key components, i is taken from any number of 1,2, … and L, namely i belongs to {1,2, … and L }.

4. The Markov modeling method for steady-state availability of the server cluster as recited in claim 3, wherein: selecting the parameters describing the state of the component described in step 2 means using three parameters: number of available servers O affected by component i_iInventory G of Components i_iNumber of shortage of spare parts BO_iThereby determining a state space of the component; at the initial moment, the inventory quantity S of the component i_i；

The "defining various states of the component" described in step 2 means that the specific defining method is as follows for various states of the component i:

state 0: (N, S)_i0), N number of available servers, S_iThe number of stocks, 0 spare part shortage number;

state 1: (N, S)_i-1,0) representing the number of N available servers, S_i-1 stock count, 0 spare part shortage count;

state S_i: (N,0,0) indicating the number of N available servers, the number of 0 stocks and the number of 0 spare part shortage;

state S_i+1: (N-1,0,1) representing the number of N-1 available servers, 0 inventory number and 1 spare part shortage number;

state S_i+ k: (N-k,0, k) indicating the number of N-k available servers, the number of 0 stock, and the number of k spare part shortages;

5. The Markov modeling method for steady-state availability of the server cluster as recited in claim 4, wherein: described in step 3Determining the transition rate of each state of the component, namely determining the Markov process transition rate of the component i according to the relationship between the states of the component i; lambda [ alpha ]_iAnd mu_iRespectively representing the failure rate and the repair rate of the component i;

when spare parts of component i are sufficient: if one component i fails, the failed component is replaced by using spare parts, the number of the spare parts is reduced by 1, the number of available servers is still N, and the component i is in a slave state (N, S)_i-q,0) to the adjacent state (N, S)_iQ-1,0) with a transfer rate NZ_iλ_i(ii) a When 1 component i is restored and then is returned to the spare part library, the number of spare parts of the component i is increased by 1, and the component i is in the slave state (N, S)_i-q-1,0) to the adjacent state (N, S)_iQ,0) with a transfer rate of (q +1) mu_i(ii) a Wherein q is more than or equal to 0 and less than or equal to S_i-1；

When a shortage of spare parts of component i occurs: if a component i fails, it will cause a server to suspend operation, i.e. the number of available servers O_iReduced by 1, spare part shortage BO_iIncrement by 1, component i will transition from state (N-k,0, k) to state (N-k-1,0, k +1) at a transition rate of (N-k) Z_iλ_i(ii) a When a failed component i is repaired, a server can be recovered to operate normally, namely the number of available servers O_iWill increase by 1, the spare part shortage number BO_iDecrease 1, component i will transition from state (N-k-1,0, k +1) to state (N-k,0, k), with a transition rate of (S)_i+k+1)μ_i(ii) a Wherein k is more than or equal to 1 and less than or equal to N-1;

the state (N,0,0) and the state (N-1,0,1) are two states of whether or not the connection spare i is in shortage; the analysis can know that if a failure occurs while the component i is in the state (N,0,0), the number of servers O can be used_iWill reduce 1, spare part shortage BO_iThe transition rate from 0 to 1 and from the state (N,0,0) to the state (N-1,0,1) is NZ_iλ_i(ii) a When a component i is in the state (N-1,0,1), if a failed component i is repaired, the number of available servers O is enabled_iIncrease 1, spare parts shortage BO_iTransition rate from 1 to 0, transition from state (N-1,0,1) to state (N,0,0)Is (S)_i+1)μ_i；

A_iIs an_i×E_iWherein E is_iIs the number of states of component i, representing the matrix A_iThe dimension of (a); when m ≠ n

Represents the rate of transition from state m to state n, when m is equal to n

Denotes the m and n line divisions

The opposite number of the sum of the other elements; m and n are numbers of the states of the parts only, 0. ltoreq. m, n. ltoreq. S_i+ N; this forms a set of continuous-time Markov chains { X over component i_i(t)＝(O_i(t)，G_i(t)，BO_i(t)); t is more than or equal to 0}, which describes the state transition process of the component i, and the transition rate among the states of the component i is shown in table 1;

TABLE 1 State transition Rate of component i

。

6. The Markov modeling method for steady-state availability of the server cluster as recited in claim 5, wherein: the transfer rate matrix of the computation unit described in step 4 is based on the continuous-time Markov chain { X } established in step 3_i(t)＝(O_i(t)，G_i(t)，BO_i(t)); t ≧ 0} for determining the transfer rate matrix for part i is as follows:

7. the Markov modeling method for steady-state availability of the server cluster as recited in claim 6, wherein: the steady-state probabilities for the states of the computation component described in step 5 are calculated as follows: the steady state probability matrix of each state of component i is recorded as pi_i，

Solving the stationary state probability pi from the following non-homogeneous linear equation set_i；

Here, A_iA transfer rate matrix representing component i; 0 represents 1 × E_iA zero matrix of dimensions; pi_i，jRepresents the steady state probability that the component i is in the state j, j is more than or equal to 0 and less than or equal to S_i+N；

After solving the non-homogeneous linear equation set, the steady-state probability of the state p can be obtained as follows:

wherein p is more than or equal to 1 and less than or equal to S_i+1；

State S_iThe steady state probability of + h is:

wherein h is more than or equal to 2 and less than or equal to N;

due to the fact that

The steady state probability that component i is in state 0 is calculated as:

wherein w is more than or equal to 2 and less than or equal to N.

8. The method of claim 7, wherein the Markov model for steady-state availability of the server cluster is as follows: the expected spare part shortage number of the component is calculated as described in step 6 by the following method: after solving the steady-state probabilities of the states of the component i, the expected spare part shortage number of the component i is calculated by the following formula:

here, EBO (S)_i) Refers to the expected spare part shortage number for component i in steady state; b_iMeans a spare part shortage number vector for component i; b_i，mIs b_iElement (b), spare part shortage number when finger i is in state m; b_iThe specific form of (A) is as follows:

9. the method of claim 8, wherein the Markov model for steady-state availability of the server cluster is as follows: the step 7 of performing the steps 2 to 6 on each type of component, combining the CTMCs of the various types of components, and constructing the CTMC family model of the server cluster means that Continuous Time Markov Chains (CTMCs) of the components are built one by one, namely, { X }_i(t)＝(O_i(t)，G_i(t)，BO_i(t)); t is more than or equal to 0, i is 1,2, …, L, and a comprehensive server cluster CTMC family { X (t) } X can be established according to the independent relation of the fault and maintenance of various components₁(t)，…，X_i(t)，…，X_L(t) }, t is more than or equal to 0} model; calculating the expected spare part shortage number of the server cluster in step 7, the calculation method is as follows: EBO of a Server Cluster under steady State_qIs equivalent to the causeThe sum of expected values for the number of unavailable servers due to a shortage of spare parts, and thus the EBO_qThe calculation formula of (a) is as follows:

10. the method of claim 9, wherein the markov modeling for steady state availability of the server cluster comprises: the "calculating the steady-state availability of the server cluster" in step 8 is calculated as follows: total number of servers N and EBO of server cluster in steady state_qSubtracting to obtain the expected value of the number of the servers; while the steady state availability A of the server cluster_qEquivalent to the percentage of the expected value of the number of the energy servers to the total number of the servers N; hence server cluster A_qThe calculation formula of (a) is as follows: