CN107783851B - Markov modeling method for steady-state availability of server cluster - Google Patents

Markov modeling method for steady-state availability of server cluster Download PDF

Info

Publication number
CN107783851B
CN107783851B CN201710867338.0A CN201710867338A CN107783851B CN 107783851 B CN107783851 B CN 107783851B CN 201710867338 A CN201710867338 A CN 201710867338A CN 107783851 B CN107783851 B CN 107783851B
Authority
CN
China
Prior art keywords
state
component
server cluster
steady
spare part
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710867338.0A
Other languages
Chinese (zh)
Other versions
CN107783851A (en
Inventor
郭霖瀚
冯晓
孔丹丹
杨懿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201710867338.0A priority Critical patent/CN107783851B/en
Publication of CN107783851A publication Critical patent/CN107783851A/en
Application granted granted Critical
Publication of CN107783851B publication Critical patent/CN107783851B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/008Reliability or availability analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3447Performance evaluation by modeling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention provides a Markov modeling method for steady-state availability of a server cluster, which comprises the following steps: 1. determining relevant parameters of the server cluster; 2. selecting parameters describing the state of the component, and defining various states of the component; 3. determining the transfer rate among the states of the component, and constructing a Continuous Time Markov Chain (CTMC) of the component; 4. calculating a transfer rate matrix of the component; 5. calculating steady-state probabilities of the states of the component; 6. calculating an expected spare part shortage number for the component; 7. respectively executing the step 2 to the step 6, and constructing a CTMC family model of the server cluster; 8. calculating the steady-state availability of the server cluster; through the steps, a CTMC model of each part is established, and a CTMC family model of the server cluster is established; calculating the availability of the system; the modeling method can reduce the state quantity in the Markov process, reduce the difficulty of calculating the steady-state availability of the server cluster, and provide valuable reference information for the design and improvement of the server cluster.

Description

Markov modeling method for steady-state availability of server cluster
Technical Field
The invention provides a Markov modeling method for steady-state availability of a server cluster, and belongs to the technical field of maintenance support.
Background
With the continuous development of internet technology, users increasingly emphasize rapid and accurate service experience, and necessarily require that information providers can provide services continuously for 24 hours all year round. Statistical data reports show that: the economic loss of U.S. enterprises due to computer system failure is nearly $ 40 billion annually, and the resulting decline in reputation is even more immeasurable. In order to avoid unplanned service interruption and provide better service experience, service providers often build server clusters (multi-server hot standby) in important data processing centers to achieve high reliability and uninterrupted service capability.
The availability of a cluster of servers is mainly affected by software and hardware. Data shows that the hardware fault accounts for more than 20% of the total fault of the server cluster. The performance and maintenance of the hardware will have a very important impact on the availability of the server cluster. From the hardware perspective, the method for determining the availability of the server cluster at home and abroad, which is researched and researched by the invention, mainly comprises the following steps: reliability block diagrams, fault trees, Petri nets, Markov processes, etc. However, the above method often does not consider the influence of the spare parts on the availability of the cluster system when the server is repaired after the server fails. Spare part provisioning is a key in security activities, and affects the service conditions of a server cluster mainly by affecting the repair turnaround time of a failed part. It is necessary to take the spare part supply guarantees into account when calculating the server cluster steady state availability. Therefore, the accurate calculation of the steady-state availability has obvious theoretical significance and practical value on the design, operation and management of the server cluster.
In order to solve the problem, the invention considers the influence of spare parts on the number of available servers in the cluster system, determines the state space of the cluster system by using three typical parameters, combines Continuous Time Markov Chains (CTMC) of all key components, and comprehensively forms a CTMC family model representing the state transition condition of the server cluster system, thereby calculating the steady-state availability of the server cluster system.
Disclosure of Invention
(1) Objects of the invention
In order to provide an efficient and reliable level of information services, it is desirable to increase the reliability and availability of server clusters. The server providers usually increase the number of redundant servers on the one hand and equip corresponding spare parts on the other hand, whereby the server clusters tend to become more and more voluminous in size. How to more accurately calculate the availability of a server cluster is also a difficult problem. In view of the above problems, the present invention provides a markov modeling method for steady-state availability of a server cluster, which provides a method for calculating the steady-state availability of the server cluster by using a markov process. The method has wide universality and is suitable for calculating the steady-state availability of the server cluster continuously working for a long time.
(2) Technical scheme
The server cluster related by the invention is composed of a plurality of same servers to form a multi-machine hot standby mode. When the server cluster executes tasks, the servers work independently. Assuming that the key components of the server are in a series structure, the server stops running as long as one key component fails; the failure time and the maintenance time of each key component are independent and obey exponential distribution; neglecting the replacement time of the fault piece; the spare part inventory structure is a single base model; repair level repair capabilities are unlimited and repairs are successful, with adequate supply of other repair resources, except for spare parts.
When the distribution of time to failure and time to repair of the servers are both exponentially distributed, such a system can be described using a Markov process, as long as the states are properly defined. According to a classical inventory balance formula, the spare part shortage number can be calculated through the inventory quantity of the spare parts. In the invention, when any one key component in the server is in shortage of spare parts, the server is in a failure state. Thus, for a cluster of servers, the granularity using markov modeling can be refined to a key component hierarchy. Changes in the number of available servers are reflected indirectly by changes in the status of critical components, and the sum of expected shortages of each spare part is then used to calculate the server cluster steady state availability.
Based on the above conditions, the method for establishing the steady-state availability of the server cluster by using the Markov model provided by the invention comprises the following steps:
step 1, determining relevant parameters of a server cluster according to the architectural characteristics of the server cluster;
step 2, selecting parameters for describing the state of the component, and defining various states of the component;
step 3, determining the transfer rate among the states of the component, and constructing a Continuous Time Markov Chain (CTMC) of the component;
step 4, calculating a transfer rate matrix of the component;
step 5, calculating the steady-state probability of each state of the component by solving a specific non-homogeneous linear equation set;
step 6, calculating the expected spare part shortage number of the component;
step 7, respectively executing the step 2 to the step 6 on each type of component, combining CTMCs of each type of component, constructing a CTMC family model of the server cluster, and calculating the expected spare part shortage number of the server cluster;
and 8, calculating the steady-state availability of the server cluster.
The step 1 of determining the relevant parameters of the server cluster refers to determining the number N of servers, the number L of key components of the servers, the failure rate and maintenance rate of each component, and the number of single server installations included in the server cluster; a typical server cluster architecture is shown in fig. 1.
Wherein, the "part" described in step 2 to step 7 refers to a specific ith-type key part, and "i" can be taken from any number of 1,2, …, L, i.e. i e {1,2, …, L }.
Wherein, the "selecting a parameter describing a state of a component" described in step 2 means using three parameters: number of available servers O affected by component iiInventory G of Components iiNumber of shortage of spare parts BOiThereby determining a state space of the component; at the initial moment, the inventory quantity S of the component ii
The "defining various states of the component" in step 2 means that the specific defining method is as follows for various states of the component i:
state 0: (N, S)i0), representing the number of N available servers, SiThe number of stocks, 0 spare part shortage number;
state 1: (N, S)i-1,0), representing the number of N available servers, Si-1 stock count, 0 spare part shortage count;
Figure BDA0001416402320000033
state Si: (N,0,0) indicating the number of N available servers, the number of 0 stocks, and the number of 0 spare part shortages;
state Si+1: (N-1,0,1) representing the number of N-1 available servers, 0 inventory number, 1 spare part shortage number;
Figure BDA0001416402320000032
state Si+ k: (N-k,0, k) representing the number of N-k available servers, the number of 0 stocks, and the number of k spare part shortages;
Figure BDA0001416402320000031
state Si+ N: (0,0, N), 0 number of available servers, 0 number of inventory, N number of spare part shortages;
so that component i has Ei=Si+ N +1 states, the numbers of the states being 0,1,2, …, Si+N。
Wherein, the step 3 of determining the transition rate among the states of the component means that the Markov process transition rate of the component i is determined according to the relationship among the states of the component i; lambda [ alpha ]iAnd muiRespectively representing the failure rate and the repair rate of the component i;
when spare parts of component i are sufficient: if one component i fails, the failed component is replaced by using the spare parts, the number of the spare parts is reduced by 1, the number of available servers is still N, and the component i is in the slave state (N, S)i-q,0) to the adjacent state (N, S)iQ-1,0) with a transfer rate NZiλi(ii) a When 1 component i is restored and then is returned to the spare part library, the number of spare parts of the component i is increased by 1, and the component i is in the slave state (N, S)i-q-1,0) to the adjacent state (N, S)iQ,0) with a transfer rate of (q +1) mui(ii) a Wherein q is more than or equal to 0 and less than or equal to Si-1;
When a shortage of spare parts of component i occurs: if a component i fails, it will cause a server to suspend operation, i.e. the number of available servers OiReduced by 1, spare part shortage BOiIncrement by 1, component i will transition from state (N-k,0, k) to state (N-k-1,0, k +1) at a transition rate of (N-k) Ziλi(ii) a When a failed component i is repaired, a server can be recovered to normal operation, namely the number of available servers OiWill increase by 1, the spare part shortage number BOiDecrease 1, component i will transition from state (N-k-1,0, k +1) to state (N-k,0, k), with a transition rate of (S)i+k+1)μi(ii) a Wherein k is more than or equal to 1 and less than or equal to N-1;
the state (N,0,0) and the state (N-1,0,1) are two states of whether or not the connection spare i is in shortage; the analysis shows that when the component i is in the state (N,0,0), if a fault occurs, the number of available servers OiWill reduce 1, spare part shortage BOiThe transition rate from 0 to 1 and from the state (N,0,0) to the state (N-1,0,1) is NZiλi(ii) a When a component i is in the state (N-1,0,1), if a failed component i is repaired, the number of available servers O will be madeiIncrease 1, spare parts shortage BOiChanges from 1 to 0, and the transition rate from the state (N-1,0,1) to the state (N,0,0) is (S)i+1)μi
The initial state of component i is (N, S)i0), the transfer rate matrix is recorded as
Figure BDA0001416402320000041
AiIs ani×EiOf the matrix of (a). When m ≠ n
Figure BDA0001416402320000042
Represents the rate of transition from state m to state n, when m is equal to n
Figure BDA0001416402320000043
Denotes the division of the m (or n) th line
Figure BDA0001416402320000044
The opposite number of the sum of the other elements; m and n are numbers of the states of the parts only, 0. ltoreq. m, n. ltoreq. Si+ N; this forms a set of continuous-time Markov chains for component i
{Xi(t)=(Oi(t),Gi(t),BOi(t)); t is more than or equal to 0}, which describes the state transition process of the component i, the corresponding state transition diagram is shown as the attached figure 2, and the transition rate among the states of the component i is shown as the table 1;
TABLE 1 State transition Rate of component i
Figure BDA0001416402320000045
Figure BDA0001416402320000051
Wherein the "calculating component transfer rate matrix" described in step 4 is based on the continuous-time Markov chain { X } established in step 3i(t)=(Oi(t),Gi(t),BOi(t)); t ≧ 0} to determine a transfer rate matrix for component i, as follows:
Figure BDA0001416402320000052
the "calculating the steady-state probabilities of the states of the components" described in step 5 is performed by the following method: the steady state probability matrix of each state of component i is recorded as pii
Figure BDA0001416402320000053
The steady state probability pi can be solved by the following non-homogeneous linear equation seti
Figure BDA0001416402320000054
Here, AiA transfer rate matrix representing component i; 0 represents 1 × EiA zero matrix of dimensions; pii,jRepresents the steady state probability that the component i is in the state j, j is more than or equal to 0 and less than or equal to Si+N;
After solving the non-homogeneous linear equation system, the state p (p is more than or equal to 1 and less than or equal to S) can be obtainediThe steady state probability of +1) is:
Figure BDA0001416402320000055
state SiThe steady state probability of + h (h is more than or equal to 2 and less than or equal to N) is as follows:
Figure BDA0001416402320000061
due to the fact that
Figure BDA0001416402320000062
The steady state probability that component i is in state 0 is calculated as:
Figure BDA0001416402320000063
wherein, the calculation method of "calculating the expected spare part shortage number of the component" described in step 6 is as follows: after solving the steady state probabilities for each state of component i, the expected spare part shortage number for component i can be calculated by the following formula:
Figure BDA0001416402320000064
here, EBO (S)i) Refers to the expected spare part shortage number for component i in steady state; biMeans a spare part shortage number vector for component i; bi,mIs biElement (b), spare part shortage number when finger i is in state m. biThe specific form of (A) is as follows:
Figure BDA0001416402320000067
wherein, the step 7 of executing each component in the steps 2 to 6 respectively, combining the CTMCs of each component to construct the CTMC family model of the server cluster means to build Continuous Time Markov Chain (CTMC) { X) of the components one by onei(t)=(Oi(t),Gi(t),BOi(t)); t ≧ 0} i ═ 1,2, …, L, and a comprehensive server cluster CTMC family { X (t) ═ X { X } can be established based on the fact that various component failures and repairs are independent of each other1(t),…,Xi(t),…,XL(t) }, t ≧ 0} model.
Wherein, the calculation method of "calculating the expected spare part shortage number of the server cluster" in step 7 is as follows: EBO of a Server Cluster under steady StateqEquivalent to the sum of expected values of the number of unavailable servers due to shortage of various spare partsThus EBOqThe calculation formula of (a) is as follows:
Figure BDA0001416402320000066
in step 8, "calculating the steady availability of the server cluster", the calculation method is as follows: total number of servers N and EBO of server cluster in steady stateqSubtracting to obtain an expected value of the number of available servers; while the steady state availability A of the server clusterqEquivalent to the percentage of the expected value of the number of available servers to the total number of servers N; hence server cluster AqThe calculation formula of (a) is as follows:
Figure BDA0001416402320000071
through the steps, under the condition of considering spare parts, firstly establishing CTMC models of various components, and then establishing CTMC family models of a server cluster by combining the CTMCs of various components according to the mutually independent relationship of the faults and maintenance among the components; and then calculating the expected spare part shortage number of the server cluster according to the CTMC family model, and calculating the steady-state availability of the server cluster by using the relation between the expected spare part shortage number of the server cluster and the steady-state availability. The modeling method can effectively reduce the number of states in the Markov process, thereby reducing the difficulty of calculating the steady-state availability of the server cluster and providing valuable reference information for the design and improvement of the server cluster.
(3) Advantages and effects
The method provides a Markov modeling method for the steady-state availability of the server cluster by considering the influence of spare parts, and has the advantages that:
① the invention can provide guidance for the server and the server cluster spare parts making plan by considering the influence of the stock and shortage of different spare parts on the availability of the server cluster.
② the invention constructs the components CTMC satisfying the stock balance relation based on the structural characteristics of the server cluster, and establishes the CTMC family model of the server cluster according to the fault logic relation between the components.
③ the invention provides a CTMC family-based server cluster steady-state availability modeling method, and provides a new technical approach for calculating the availability index of the server cluster by applying the CTMC method.
Drawings
FIG. 1 is a typical server cluster architecture.
Figure 2 is a markov state transition diagram for component i.
FIG. 3 is a flow chart of a modeling method of the present invention.
Figure 4 is a markov state transition diagram for component 1.
The symbols in the figures are as follows:
n refers to the number of servers contained in the cluster system;
l refers to the number of types of parts;
Zia single server installation number representing component i;
λirefers to the failure rate of component i;
μimeans maintenance rate of component i;
Figure BDA0001416402320000081
represents the transition rate from state m to state n;
πimeans the steady state probability vectors of each state of component i;
πi,jrefers to the steady state probability that component i is in state j;
bimeans a spare part shortage number vector for component i;
bi,mmeans the spare part shortage number when the component i is in the state m;
Eirefers to the number of states of component i;
Aia state transition rate matrix representing component i;
EBO(Si) Refers to the expected spare part shortage number for component i;
EBOqrefers to the expected spare part shortage number of the server cluster;
Aqrefers to the server cluster steady state availability;
CTMC refers to a continuous time Markov chain (continuous time Markov chain);
Xi(t) refers to a continuous-time Markov chain of components i;
x (t) refers to the CTMC family of server clusters;
Oirefers to the number of available servers affected by component i;
Girefers to the inventory of component i;
BOithe number of spare parts shortage of the component i is referred to;
Sirefers to the initial inventory of component i;
Oi(t) refers to the number of available servers in the continuous-time Markov chain for component i;
Gi(t) refers to the number of spare parts available in the continuous time Markov chain for component i;
BOi(t) refers to the number of spare part shortages in the continuous-time markov chain of component i;
Detailed Description
The following provides a more detailed description of the embodiments of the present invention with reference to the examples. In general, a complex server can be converted into a system with a serial structure by means of equivalence, combination and the like. The server cluster referred to in the following example contains 10 homogeneous servers, forming a multi-server hot-standby mode. Each server is composed of 8 types of key components in series. And setting the fault time and the maintenance time of various key components to respectively obey the exponential distribution.
The invention discloses a Markov modeling method for steady-state availability of a server cluster, a specific implementation flow is shown in figure 3, and the actual implementation steps are as follows:
step 1, collecting relevant information of the server cluster. N-10 and L-8 may be determined according to the architecture characteristics of the server cluster, and the specific component information is shown in table 2 below.
TABLE 2 part-related parameters
Figure BDA0001416402320000091
Step 2 three parameters were selected for part 1: number of available servers O1Initial inventory number S1Number of spare parts shortage BO1To indicate the state transition of the component 1. At initial state, O1=10、S1=1、BO1At this time, the component 1 has 12 states, namely ① state 0 (10,1,0), ② state 1 (10,0,0), ③ state 2 (9,0,1), ④ state 3 (8,0,2), ⑤ state 4 (7,0,3), ⑥ state 5 (6,0,4), ⑦ state 6 (5,0,5), ⑧ state 7 (4,0,6), ⑨ state 8 (3,0,7), ⑩ state 9 (2,0,8),
Figure BDA0001416402320000092
State 10: (1,0,9),
Figure BDA0001416402320000093
State 11: (0,0,10).
Step 3 determines the transfer rate of component 1 and constructs a continuous-time markov chain for component 1. In the present example, the number of initial devices of the part 1 is 1, and the transition rate between the states of the part 1 is calculated as shown in the following table:
TABLE 2 State transition Rate of component 1
Figure BDA0001416402320000101
The transfer rate matrix is expressed as
Figure BDA0001416402320000102
Is a 12-dimensional square matrix. When m ≠ n
Figure BDA0001416402320000103
Represents the rate of transition from state m to state n, when m is equal to n
Figure BDA0001416402320000104
Denotes the division of the m (or n) th line
Figure BDA0001416402320000105
The sum of the other elements is equal to or more than 0 and equal to or less than 11. This forms a set of continuous-time Markov chains { X } for component 11(t)=(O1(t),G1(t),BO1(t)); t ≧ 0}, which describes the component 1 state transition process, the corresponding state transition diagram is shown in FIG. 4.
Step 4 calculates the transfer rate matrix of the component 1. On the basis of step 3, the state transition rate matrix of component 1 can be obtained by combining the data in table 1 as follows:
Figure BDA0001416402320000111
step 5 calculates the steady-state probabilities for each state of the component 1. According to the state transition rate matrix obtained in the step 4, calculating the following formula:
Figure BDA0001416402320000112
the steady-state probability of each state of the component 1 is calculated as:
π1=[0.99844565,0.00155314,0.00000121,0,0,0,0,0,0,0,0,0]
step 6 calculates the expected spare part shortage number for component 1. N is a1The expected spare part shortage number is calculated by substituting the following equation:
Figure BDA0001416402320000113
wherein b is1=[0,0,1,2,3,4,5,6,7,8,9,10]T
The expected spare part shortage number for component 1 is calculated as: EBO (S)1)=0.00000121。
Step 7 repeating steps 2 to 6 for the remaining components 2 to 8, establishingThe CTMC family model for a server cluster is as follows: { X (t) ═ X1(t),…,Xi(t),…,X8(t) }, t ≧ 0 }. The expected spare part shortage number for the remaining components is then calculated as follows:
EBO(S2)=0.00000167
EBO(S3)=0.00000425
EBO(S4)=0.00000555
EBO(S5)=0.00000354
EBO(S6)=0.00001625
EBO(S7)=0.00003183
EBO(S8)=0.00000019
and 8, calculating the steady-state availability of the server cluster. Obtaining the expected spare part shortage number EBO of the server cluster by summing the expected spare part shortage numbers of the various componentsq
Figure BDA0001416402320000121
The steady state availability A of the server cluster is determined according to the functional relationship between the expected spare part shortage number of the server cluster and the steady state availabilityqCan be calculated from the following formula:
Figure BDA0001416402320000122

Claims (10)

1. a Markov modeling method for steady-state availability of a server cluster is characterized by comprising the following steps: the method comprises the following steps:
step 1, determining relevant parameters of a server cluster according to the architectural characteristics of the server cluster;
step 2, selecting parameters for describing the state of the component, and defining various states of the component;
step 3, determining the transfer rate among the states of the component, and constructing a Continuous Time Markov Chain (CTMC) of the component;
step 4, calculating a transfer rate matrix of the component;
step 5, calculating the steady-state probability of each state of the component by solving a specific non-homogeneous linear equation set;
step 6, calculating the expected spare part shortage number of the component;
step 7, respectively executing the step 2 to the step 6 on each type of component, combining CTMCs of each type of component, constructing a CTMC family model of the server cluster, and calculating the expected spare part shortage number of the server cluster;
step 8, calculating the steady-state availability of the server cluster;
through the steps, under the condition of considering spare parts, firstly establishing CTMC models of various components, and then establishing CTMC family models of a server cluster by combining the CTMCs of various components according to the mutually independent relationship of the faults and maintenance among the components; and then calculating the expected spare part shortage number of the server cluster according to the CTMC family model, and calculating the steady-state availability of the server cluster by using the relation between the expected spare part shortage number of the server cluster and the steady-state availability.
2. The method of claim 1, wherein the Markov model for steady-state availability of the server cluster is as follows: the determining of the relevant parameters of the server cluster in step 1 refers to determining the number N of servers, the number L of key components of the servers, the failure rate and the repair rate of each component, and the number Z of single server installations included in the server clusteri
3. The method of claim 2, wherein the Markov model for steady-state availability of the server cluster is as follows: the components in the steps 2 to 7 refer to specific ith-type key components, i is taken from any number of 1,2, … and L, namely i belongs to {1,2, … and L }.
4. The Markov modeling method for steady-state availability of the server cluster as recited in claim 3, wherein: selecting the parameters describing the state of the component described in step 2 means using three parameters: number of available servers O affected by component iiInventory G of Components iiNumber of shortage of spare parts BOiThereby determining a state space of the component; at the initial moment, the inventory quantity S of the component ii
The "defining various states of the component" described in step 2 means that the specific defining method is as follows for various states of the component i:
state 0: (N, S)i0), N number of available servers, SiThe number of stocks, 0 spare part shortage number;
state 1: (N, S)i-1,0) representing the number of N available servers, Si-1 stock count, 0 spare part shortage count;
Figure FDA0002445310830000021
state Si: (N,0,0) indicating the number of N available servers, the number of 0 stocks and the number of 0 spare part shortage;
state Si+1: (N-1,0,1) representing the number of N-1 available servers, 0 inventory number and 1 spare part shortage number;
Figure FDA0002445310830000031
state Si+ k: (N-k,0, k) indicating the number of N-k available servers, the number of 0 stock, and the number of k spare part shortages;
Figure FDA0002445310830000032
state Si+ N: (0,0, N), 0 number of available servers, 0 number of inventory, N number of spare part shortages;
so that component i has Ei=Si+ N +1 states, the numbers of the states being 0,1,2, …, Si+N。
5. The Markov modeling method for steady-state availability of the server cluster as recited in claim 4, wherein: described in step 3Determining the transition rate of each state of the component, namely determining the Markov process transition rate of the component i according to the relationship between the states of the component i; lambda [ alpha ]iAnd muiRespectively representing the failure rate and the repair rate of the component i;
when spare parts of component i are sufficient: if one component i fails, the failed component is replaced by using spare parts, the number of the spare parts is reduced by 1, the number of available servers is still N, and the component i is in a slave state (N, S)i-q,0) to the adjacent state (N, S)iQ-1,0) with a transfer rate NZiλi(ii) a When 1 component i is restored and then is returned to the spare part library, the number of spare parts of the component i is increased by 1, and the component i is in the slave state (N, S)i-q-1,0) to the adjacent state (N, S)iQ,0) with a transfer rate of (q +1) mui(ii) a Wherein q is more than or equal to 0 and less than or equal to Si-1;
When a shortage of spare parts of component i occurs: if a component i fails, it will cause a server to suspend operation, i.e. the number of available servers OiReduced by 1, spare part shortage BOiIncrement by 1, component i will transition from state (N-k,0, k) to state (N-k-1,0, k +1) at a transition rate of (N-k) Ziλi(ii) a When a failed component i is repaired, a server can be recovered to operate normally, namely the number of available servers OiWill increase by 1, the spare part shortage number BOiDecrease 1, component i will transition from state (N-k-1,0, k +1) to state (N-k,0, k), with a transition rate of (S)i+k+1)μi(ii) a Wherein k is more than or equal to 1 and less than or equal to N-1;
the state (N,0,0) and the state (N-1,0,1) are two states of whether or not the connection spare i is in shortage; the analysis can know that if a failure occurs while the component i is in the state (N,0,0), the number of servers O can be usediWill reduce 1, spare part shortage BOiThe transition rate from 0 to 1 and from the state (N,0,0) to the state (N-1,0,1) is NZiλi(ii) a When a component i is in the state (N-1,0,1), if a failed component i is repaired, the number of available servers O is enablediIncrease 1, spare parts shortage BOiTransition rate from 1 to 0, transition from state (N-1,0,1) to state (N,0,0)Is (S)i+1)μi
The initial state of component i is (N, S)i0), the transfer rate matrix is recorded as
Figure FDA0002445310830000041
AiIs ani×EiWherein E isiIs the number of states of component i, representing the matrix AiThe dimension of (a); when m ≠ n
Figure FDA0002445310830000042
Represents the rate of transition from state m to state n, when m is equal to n
Figure FDA0002445310830000043
Denotes the m and n line divisions
Figure FDA0002445310830000044
The opposite number of the sum of the other elements; m and n are numbers of the states of the parts only, 0. ltoreq. m, n. ltoreq. Si+ N; this forms a set of continuous-time Markov chains { X over component ii(t)=(Oi(t),Gi(t),BOi(t)); t is more than or equal to 0}, which describes the state transition process of the component i, and the transition rate among the states of the component i is shown in table 1;
TABLE 1 State transition Rate of component i
Figure FDA0002445310830000051
6. The Markov modeling method for steady-state availability of the server cluster as recited in claim 5, wherein: the transfer rate matrix of the computation unit described in step 4 is based on the continuous-time Markov chain { X } established in step 3i(t)=(Oi(t),Gi(t),BOi(t)); t ≧ 0} for determining the transfer rate matrix for part i is as follows:
Figure FDA0002445310830000052
7. the Markov modeling method for steady-state availability of the server cluster as recited in claim 6, wherein: the steady-state probabilities for the states of the computation component described in step 5 are calculated as follows: the steady state probability matrix of each state of component i is recorded as pii
Figure FDA0002445310830000053
Solving the stationary state probability pi from the following non-homogeneous linear equation seti
Figure FDA0002445310830000054
Here, AiA transfer rate matrix representing component i; 0 represents 1 × EiA zero matrix of dimensions; pii,jRepresents the steady state probability that the component i is in the state j, j is more than or equal to 0 and less than or equal to Si+N;
After solving the non-homogeneous linear equation set, the steady-state probability of the state p can be obtained as follows:
Figure FDA0002445310830000061
wherein p is more than or equal to 1 and less than or equal to Si+1;
State SiThe steady state probability of + h is:
Figure FDA0002445310830000062
wherein h is more than or equal to 2 and less than or equal to N;
due to the fact that
Figure FDA0002445310830000063
The steady state probability that component i is in state 0 is calculated as:
Figure FDA0002445310830000064
wherein w is more than or equal to 2 and less than or equal to N.
8. The method of claim 7, wherein the Markov model for steady-state availability of the server cluster is as follows: the expected spare part shortage number of the component is calculated as described in step 6 by the following method: after solving the steady-state probabilities of the states of the component i, the expected spare part shortage number of the component i is calculated by the following formula:
Figure FDA0002445310830000065
here, EBO (S)i) Refers to the expected spare part shortage number for component i in steady state; biMeans a spare part shortage number vector for component i; bi,mIs biElement (b), spare part shortage number when finger i is in state m; biThe specific form of (A) is as follows:
Figure FDA0002445310830000071
9. the method of claim 8, wherein the Markov model for steady-state availability of the server cluster is as follows: the step 7 of performing the steps 2 to 6 on each type of component, combining the CTMCs of the various types of components, and constructing the CTMC family model of the server cluster means that Continuous Time Markov Chains (CTMCs) of the components are built one by one, namely, { X }i(t)=(Oi(t),Gi(t),BOi(t)); t is more than or equal to 0, i is 1,2, …, L, and a comprehensive server cluster CTMC family { X (t) } X can be established according to the independent relation of the fault and maintenance of various components1(t),…,Xi(t),…,XL(t) }, t is more than or equal to 0} model; calculating the expected spare part shortage number of the server cluster in step 7, the calculation method is as follows: EBO of a Server Cluster under steady StateqIs equivalent to the causeThe sum of expected values for the number of unavailable servers due to a shortage of spare parts, and thus the EBOqThe calculation formula of (a) is as follows:
Figure FDA0002445310830000072
10. the method of claim 9, wherein the markov modeling for steady state availability of the server cluster comprises: the "calculating the steady-state availability of the server cluster" in step 8 is calculated as follows: total number of servers N and EBO of server cluster in steady stateqSubtracting to obtain the expected value of the number of the servers; while the steady state availability A of the server clusterqEquivalent to the percentage of the expected value of the number of the energy servers to the total number of the servers N; hence server cluster AqThe calculation formula of (a) is as follows:
Figure FDA0002445310830000073
CN201710867338.0A 2017-09-22 2017-09-22 Markov modeling method for steady-state availability of server cluster Active CN107783851B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710867338.0A CN107783851B (en) 2017-09-22 2017-09-22 Markov modeling method for steady-state availability of server cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710867338.0A CN107783851B (en) 2017-09-22 2017-09-22 Markov modeling method for steady-state availability of server cluster

Publications (2)

Publication Number Publication Date
CN107783851A CN107783851A (en) 2018-03-09
CN107783851B true CN107783851B (en) 2020-06-02

Family

ID=61433563

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710867338.0A Active CN107783851B (en) 2017-09-22 2017-09-22 Markov modeling method for steady-state availability of server cluster

Country Status (1)

Country Link
CN (1) CN107783851B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147306B (en) * 2019-04-26 2020-12-15 北京航空航天大学 Fault-tolerant software reliability and performance evaluation method considering correlation failure
CN111460363A (en) * 2020-04-01 2020-07-28 丰车(上海)信息技术有限公司 Second-hand vehicle supply chain site selection and inventory level management algorithm

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103729693A (en) * 2013-12-23 2014-04-16 清华大学 Maintenance and spare part supply combined optimization method based on deterministic inventory degradation model
CN105825045A (en) * 2016-03-11 2016-08-03 西北工业大学 Repairable spare part demand prediction method for phased-mission system
CN106886822A (en) * 2017-01-18 2017-06-23 西北工业大学 The polymorphic series connection of oriented mission can repair equipment weak element recognition methods
CN106919984A (en) * 2017-02-22 2017-07-04 西北工业大学 Parallel system Repairable Unit repair determining method based on cost
WO2017093560A9 (en) * 2015-12-03 2017-09-21 Electricite De France Estimating the reliability of an industrial system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103729693A (en) * 2013-12-23 2014-04-16 清华大学 Maintenance and spare part supply combined optimization method based on deterministic inventory degradation model
WO2017093560A9 (en) * 2015-12-03 2017-09-21 Electricite De France Estimating the reliability of an industrial system
CN105825045A (en) * 2016-03-11 2016-08-03 西北工业大学 Repairable spare part demand prediction method for phased-mission system
CN106886822A (en) * 2017-01-18 2017-06-23 西北工业大学 The polymorphic series connection of oriented mission can repair equipment weak element recognition methods
CN106919984A (en) * 2017-02-22 2017-07-04 西北工业大学 Parallel system Repairable Unit repair determining method based on cost

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
考虑休眠的两部件***可用度马氏建模方法;耿岩 等;《仪器仪表学报》;20160930;第37卷(第9期);第1996-2003页 *

Also Published As

Publication number Publication date
CN107783851A (en) 2018-03-09

Similar Documents

Publication Publication Date Title
JP5294875B2 (en) Automatic state estimation system for cluster devices and method of operating same
Lanus et al. Hierarchical composition and aggregation of state-based availability and performability models
CN109656911A (en) Distributed variable-frequencypump Database Systems and its data processing method
CN103970587B (en) A kind of method, apparatus and system of scheduling of resource
CN107783851B (en) Markov modeling method for steady-state availability of server cluster
CN110645153A (en) Wind generating set fault diagnosis method and device and electronic equipment
CN111209301A (en) Method and system for improving operation performance based on dependency tree splitting
CN107818418B (en) Modeling method for time-varying inventory utilization rate and satisfaction rate of electronic equipment
Requeijo et al. Six sigma business scorecard approach to support maintenance projects in a collaborative context
CN114510317A (en) Virtual machine management method, device, equipment and storage medium
CN110261159B (en) Fault diagnosis method for flexible manufacturing cutter subsystem
CN103593249B (en) A kind of HA method for early warning and virtual resource manager
CN109359800B (en) Evaluation method and system for running state of power distribution automation master station system
CN105786482A (en) Artificial intelligence system
CN116070906A (en) Risk identification and assessment method based on complex product supplier supply chain
CN111784229B (en) Inventory configuration method of weapon system
CN114116122A (en) High-availability load platform for application container
Hac Using a software reliability model to design a telecommunications software architecture
CN105896534A (en) Fault state set screening method for power transmission system considering importance degree and association degree of line
Li et al. Research on Availability Evaluation of the Communication Uninterrupted Power Supply System
CN112131723A (en) Markov theory-based energy management system reliability analysis method
Yuichi et al. Orchestrator for Automating Failure Response in Telecom Carriers
Jena et al. Fuzzy reliability analysis in interconnection networks
Sidorov et al. Meta-monitoring system for ensuring a fault tolerance of the intelligent high-performance computing environment.
Ma et al. Research on cutting quality prediction technology of aviation structural parts based on JAYA-GABP algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant