CN107783851B - Markov modeling method for steady-state availability of server cluster - Google Patents
Markov modeling method for steady-state availability of server cluster Download PDFInfo
- Publication number
- CN107783851B CN107783851B CN201710867338.0A CN201710867338A CN107783851B CN 107783851 B CN107783851 B CN 107783851B CN 201710867338 A CN201710867338 A CN 201710867338A CN 107783851 B CN107783851 B CN 107783851B
- Authority
- CN
- China
- Prior art keywords
- state
- component
- server cluster
- steady
- spare part
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/008—Reliability or availability analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3447—Performance evaluation by modeling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3452—Performance evaluation by statistical analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Hardware Design (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Hardware Redundancy (AREA)
Abstract
The invention provides a Markov modeling method for steady-state availability of a server cluster, which comprises the following steps: 1. determining relevant parameters of the server cluster; 2. selecting parameters describing the state of the component, and defining various states of the component; 3. determining the transfer rate among the states of the component, and constructing a Continuous Time Markov Chain (CTMC) of the component; 4. calculating a transfer rate matrix of the component; 5. calculating steady-state probabilities of the states of the component; 6. calculating an expected spare part shortage number for the component; 7. respectively executing the step 2 to the step 6, and constructing a CTMC family model of the server cluster; 8. calculating the steady-state availability of the server cluster; through the steps, a CTMC model of each part is established, and a CTMC family model of the server cluster is established; calculating the availability of the system; the modeling method can reduce the state quantity in the Markov process, reduce the difficulty of calculating the steady-state availability of the server cluster, and provide valuable reference information for the design and improvement of the server cluster.
Description
Technical Field
The invention provides a Markov modeling method for steady-state availability of a server cluster, and belongs to the technical field of maintenance support.
Background
With the continuous development of internet technology, users increasingly emphasize rapid and accurate service experience, and necessarily require that information providers can provide services continuously for 24 hours all year round. Statistical data reports show that: the economic loss of U.S. enterprises due to computer system failure is nearly $ 40 billion annually, and the resulting decline in reputation is even more immeasurable. In order to avoid unplanned service interruption and provide better service experience, service providers often build server clusters (multi-server hot standby) in important data processing centers to achieve high reliability and uninterrupted service capability.
The availability of a cluster of servers is mainly affected by software and hardware. Data shows that the hardware fault accounts for more than 20% of the total fault of the server cluster. The performance and maintenance of the hardware will have a very important impact on the availability of the server cluster. From the hardware perspective, the method for determining the availability of the server cluster at home and abroad, which is researched and researched by the invention, mainly comprises the following steps: reliability block diagrams, fault trees, Petri nets, Markov processes, etc. However, the above method often does not consider the influence of the spare parts on the availability of the cluster system when the server is repaired after the server fails. Spare part provisioning is a key in security activities, and affects the service conditions of a server cluster mainly by affecting the repair turnaround time of a failed part. It is necessary to take the spare part supply guarantees into account when calculating the server cluster steady state availability. Therefore, the accurate calculation of the steady-state availability has obvious theoretical significance and practical value on the design, operation and management of the server cluster.
In order to solve the problem, the invention considers the influence of spare parts on the number of available servers in the cluster system, determines the state space of the cluster system by using three typical parameters, combines Continuous Time Markov Chains (CTMC) of all key components, and comprehensively forms a CTMC family model representing the state transition condition of the server cluster system, thereby calculating the steady-state availability of the server cluster system.
Disclosure of Invention
(1) Objects of the invention
In order to provide an efficient and reliable level of information services, it is desirable to increase the reliability and availability of server clusters. The server providers usually increase the number of redundant servers on the one hand and equip corresponding spare parts on the other hand, whereby the server clusters tend to become more and more voluminous in size. How to more accurately calculate the availability of a server cluster is also a difficult problem. In view of the above problems, the present invention provides a markov modeling method for steady-state availability of a server cluster, which provides a method for calculating the steady-state availability of the server cluster by using a markov process. The method has wide universality and is suitable for calculating the steady-state availability of the server cluster continuously working for a long time.
(2) Technical scheme
The server cluster related by the invention is composed of a plurality of same servers to form a multi-machine hot standby mode. When the server cluster executes tasks, the servers work independently. Assuming that the key components of the server are in a series structure, the server stops running as long as one key component fails; the failure time and the maintenance time of each key component are independent and obey exponential distribution; neglecting the replacement time of the fault piece; the spare part inventory structure is a single base model; repair level repair capabilities are unlimited and repairs are successful, with adequate supply of other repair resources, except for spare parts.
When the distribution of time to failure and time to repair of the servers are both exponentially distributed, such a system can be described using a Markov process, as long as the states are properly defined. According to a classical inventory balance formula, the spare part shortage number can be calculated through the inventory quantity of the spare parts. In the invention, when any one key component in the server is in shortage of spare parts, the server is in a failure state. Thus, for a cluster of servers, the granularity using markov modeling can be refined to a key component hierarchy. Changes in the number of available servers are reflected indirectly by changes in the status of critical components, and the sum of expected shortages of each spare part is then used to calculate the server cluster steady state availability.
Based on the above conditions, the method for establishing the steady-state availability of the server cluster by using the Markov model provided by the invention comprises the following steps:
step 3, determining the transfer rate among the states of the component, and constructing a Continuous Time Markov Chain (CTMC) of the component;
step 4, calculating a transfer rate matrix of the component;
step 5, calculating the steady-state probability of each state of the component by solving a specific non-homogeneous linear equation set;
step 6, calculating the expected spare part shortage number of the component;
and 8, calculating the steady-state availability of the server cluster.
The step 1 of determining the relevant parameters of the server cluster refers to determining the number N of servers, the number L of key components of the servers, the failure rate and maintenance rate of each component, and the number of single server installations included in the server cluster; a typical server cluster architecture is shown in fig. 1.
Wherein, the "part" described in step 2 to step 7 refers to a specific ith-type key part, and "i" can be taken from any number of 1,2, …, L, i.e. i e {1,2, …, L }.
Wherein, the "selecting a parameter describing a state of a component" described in step 2 means using three parameters: number of available servers O affected by component iiInventory G of Components iiNumber of shortage of spare parts BOiThereby determining a state space of the component; at the initial moment, the inventory quantity S of the component ii。
The "defining various states of the component" in step 2 means that the specific defining method is as follows for various states of the component i:
state 0: (N, S)i0), representing the number of N available servers, SiThe number of stocks, 0 spare part shortage number;
state 1: (N, S)i-1,0), representing the number of N available servers, Si-1 stock count, 0 spare part shortage count;
state Si: (N,0,0) indicating the number of N available servers, the number of 0 stocks, and the number of 0 spare part shortages;
state Si+1: (N-1,0,1) representing the number of N-1 available servers, 0 inventory number, 1 spare part shortage number;
state Si+ k: (N-k,0, k) representing the number of N-k available servers, the number of 0 stocks, and the number of k spare part shortages;
state Si+ N: (0,0, N), 0 number of available servers, 0 number of inventory, N number of spare part shortages;
so that component i has Ei=Si+ N +1 states, the numbers of the states being 0,1,2, …, Si+N。
Wherein, the step 3 of determining the transition rate among the states of the component means that the Markov process transition rate of the component i is determined according to the relationship among the states of the component i; lambda [ alpha ]iAnd muiRespectively representing the failure rate and the repair rate of the component i;
when spare parts of component i are sufficient: if one component i fails, the failed component is replaced by using the spare parts, the number of the spare parts is reduced by 1, the number of available servers is still N, and the component i is in the slave state (N, S)i-q,0) to the adjacent state (N, S)iQ-1,0) with a transfer rate NZiλi(ii) a When 1 component i is restored and then is returned to the spare part library, the number of spare parts of the component i is increased by 1, and the component i is in the slave state (N, S)i-q-1,0) to the adjacent state (N, S)iQ,0) with a transfer rate of (q +1) mui(ii) a Wherein q is more than or equal to 0 and less than or equal to Si-1;
When a shortage of spare parts of component i occurs: if a component i fails, it will cause a server to suspend operation, i.e. the number of available servers OiReduced by 1, spare part shortage BOiIncrement by 1, component i will transition from state (N-k,0, k) to state (N-k-1,0, k +1) at a transition rate of (N-k) Ziλi(ii) a When a failed component i is repaired, a server can be recovered to normal operation, namely the number of available servers OiWill increase by 1, the spare part shortage number BOiDecrease 1, component i will transition from state (N-k-1,0, k +1) to state (N-k,0, k), with a transition rate of (S)i+k+1)μi(ii) a Wherein k is more than or equal to 1 and less than or equal to N-1;
the state (N,0,0) and the state (N-1,0,1) are two states of whether or not the connection spare i is in shortage; the analysis shows that when the component i is in the state (N,0,0), if a fault occurs, the number of available servers OiWill reduce 1, spare part shortage BOiThe transition rate from 0 to 1 and from the state (N,0,0) to the state (N-1,0,1) is NZiλi(ii) a When a component i is in the state (N-1,0,1), if a failed component i is repaired, the number of available servers O will be madeiIncrease 1, spare parts shortage BOiChanges from 1 to 0, and the transition rate from the state (N-1,0,1) to the state (N,0,0) is (S)i+1)μi;
The initial state of component i is (N, S)i0), the transfer rate matrix is recorded asAiIs ani×EiOf the matrix of (a). When m ≠ nRepresents the rate of transition from state m to state n, when m is equal to nDenotes the division of the m (or n) th lineThe opposite number of the sum of the other elements; m and n are numbers of the states of the parts only, 0. ltoreq. m, n. ltoreq. Si+ N; this forms a set of continuous-time Markov chains for component i
{Xi(t)=(Oi(t),Gi(t),BOi(t)); t is more than or equal to 0}, which describes the state transition process of the component i, the corresponding state transition diagram is shown as the attached figure 2, and the transition rate among the states of the component i is shown as the table 1;
TABLE 1 State transition Rate of component i
Wherein the "calculating component transfer rate matrix" described in step 4 is based on the continuous-time Markov chain { X } established in step 3i(t)=(Oi(t),Gi(t),BOi(t)); t ≧ 0} to determine a transfer rate matrix for component i, as follows:
the "calculating the steady-state probabilities of the states of the components" described in step 5 is performed by the following method: the steady state probability matrix of each state of component i is recorded as pii,The steady state probability pi can be solved by the following non-homogeneous linear equation seti;
Here, AiA transfer rate matrix representing component i; 0 represents 1 × EiA zero matrix of dimensions; pii,jRepresents the steady state probability that the component i is in the state j, j is more than or equal to 0 and less than or equal to Si+N;
After solving the non-homogeneous linear equation system, the state p (p is more than or equal to 1 and less than or equal to S) can be obtainediThe steady state probability of +1) is:
state SiThe steady state probability of + h (h is more than or equal to 2 and less than or equal to N) is as follows:
wherein, the calculation method of "calculating the expected spare part shortage number of the component" described in step 6 is as follows: after solving the steady state probabilities for each state of component i, the expected spare part shortage number for component i can be calculated by the following formula:
here, EBO (S)i) Refers to the expected spare part shortage number for component i in steady state; biMeans a spare part shortage number vector for component i; bi,mIs biElement (b), spare part shortage number when finger i is in state m. biThe specific form of (A) is as follows:
wherein, the step 7 of executing each component in the steps 2 to 6 respectively, combining the CTMCs of each component to construct the CTMC family model of the server cluster means to build Continuous Time Markov Chain (CTMC) { X) of the components one by onei(t)=(Oi(t),Gi(t),BOi(t)); t ≧ 0} i ═ 1,2, …, L, and a comprehensive server cluster CTMC family { X (t) ═ X { X } can be established based on the fact that various component failures and repairs are independent of each other1(t),…,Xi(t),…,XL(t) }, t ≧ 0} model.
Wherein, the calculation method of "calculating the expected spare part shortage number of the server cluster" in step 7 is as follows: EBO of a Server Cluster under steady StateqEquivalent to the sum of expected values of the number of unavailable servers due to shortage of various spare partsThus EBOqThe calculation formula of (a) is as follows:
in step 8, "calculating the steady availability of the server cluster", the calculation method is as follows: total number of servers N and EBO of server cluster in steady stateqSubtracting to obtain an expected value of the number of available servers; while the steady state availability A of the server clusterqEquivalent to the percentage of the expected value of the number of available servers to the total number of servers N; hence server cluster AqThe calculation formula of (a) is as follows:
through the steps, under the condition of considering spare parts, firstly establishing CTMC models of various components, and then establishing CTMC family models of a server cluster by combining the CTMCs of various components according to the mutually independent relationship of the faults and maintenance among the components; and then calculating the expected spare part shortage number of the server cluster according to the CTMC family model, and calculating the steady-state availability of the server cluster by using the relation between the expected spare part shortage number of the server cluster and the steady-state availability. The modeling method can effectively reduce the number of states in the Markov process, thereby reducing the difficulty of calculating the steady-state availability of the server cluster and providing valuable reference information for the design and improvement of the server cluster.
(3) Advantages and effects
The method provides a Markov modeling method for the steady-state availability of the server cluster by considering the influence of spare parts, and has the advantages that:
① the invention can provide guidance for the server and the server cluster spare parts making plan by considering the influence of the stock and shortage of different spare parts on the availability of the server cluster.
② the invention constructs the components CTMC satisfying the stock balance relation based on the structural characteristics of the server cluster, and establishes the CTMC family model of the server cluster according to the fault logic relation between the components.
③ the invention provides a CTMC family-based server cluster steady-state availability modeling method, and provides a new technical approach for calculating the availability index of the server cluster by applying the CTMC method.
Drawings
FIG. 1 is a typical server cluster architecture.
Figure 2 is a markov state transition diagram for component i.
FIG. 3 is a flow chart of a modeling method of the present invention.
Figure 4 is a markov state transition diagram for component 1.
The symbols in the figures are as follows:
n refers to the number of servers contained in the cluster system;
l refers to the number of types of parts;
Zia single server installation number representing component i;
λirefers to the failure rate of component i;
μimeans maintenance rate of component i;
πimeans the steady state probability vectors of each state of component i;
πi,jrefers to the steady state probability that component i is in state j;
bimeans a spare part shortage number vector for component i;
bi,mmeans the spare part shortage number when the component i is in the state m;
Eirefers to the number of states of component i;
Aia state transition rate matrix representing component i;
EBO(Si) Refers to the expected spare part shortage number for component i;
EBOqrefers to the expected spare part shortage number of the server cluster;
Aqrefers to the server cluster steady state availability;
CTMC refers to a continuous time Markov chain (continuous time Markov chain);
Xi(t) refers to a continuous-time Markov chain of components i;
x (t) refers to the CTMC family of server clusters;
Oirefers to the number of available servers affected by component i;
Girefers to the inventory of component i;
BOithe number of spare parts shortage of the component i is referred to;
Sirefers to the initial inventory of component i;
Oi(t) refers to the number of available servers in the continuous-time Markov chain for component i;
Gi(t) refers to the number of spare parts available in the continuous time Markov chain for component i;
BOi(t) refers to the number of spare part shortages in the continuous-time markov chain of component i;
Detailed Description
The following provides a more detailed description of the embodiments of the present invention with reference to the examples. In general, a complex server can be converted into a system with a serial structure by means of equivalence, combination and the like. The server cluster referred to in the following example contains 10 homogeneous servers, forming a multi-server hot-standby mode. Each server is composed of 8 types of key components in series. And setting the fault time and the maintenance time of various key components to respectively obey the exponential distribution.
The invention discloses a Markov modeling method for steady-state availability of a server cluster, a specific implementation flow is shown in figure 3, and the actual implementation steps are as follows:
TABLE 2 part-related parameters
Step 3 determines the transfer rate of component 1 and constructs a continuous-time markov chain for component 1. In the present example, the number of initial devices of the part 1 is 1, and the transition rate between the states of the part 1 is calculated as shown in the following table:
TABLE 2 State transition Rate of component 1
The transfer rate matrix is expressed asIs a 12-dimensional square matrix. When m ≠ nRepresents the rate of transition from state m to state n, when m is equal to nDenotes the division of the m (or n) th lineThe sum of the other elements is equal to or more than 0 and equal to or less than 11. This forms a set of continuous-time Markov chains { X } for component 11(t)=(O1(t),G1(t),BO1(t)); t ≧ 0}, which describes the component 1 state transition process, the corresponding state transition diagram is shown in FIG. 4.
Step 4 calculates the transfer rate matrix of the component 1. On the basis of step 3, the state transition rate matrix of component 1 can be obtained by combining the data in table 1 as follows:
step 5 calculates the steady-state probabilities for each state of the component 1. According to the state transition rate matrix obtained in the step 4, calculating the following formula:
the steady-state probability of each state of the component 1 is calculated as:
π1=[0.99844565,0.00155314,0.00000121,0,0,0,0,0,0,0,0,0]
step 6 calculates the expected spare part shortage number for component 1. N is a1The expected spare part shortage number is calculated by substituting the following equation:
wherein b is1=[0,0,1,2,3,4,5,6,7,8,9,10]T,
The expected spare part shortage number for component 1 is calculated as: EBO (S)1)=0.00000121。
EBO(S2)=0.00000167
EBO(S3)=0.00000425
EBO(S4)=0.00000555
EBO(S5)=0.00000354
EBO(S6)=0.00001625
EBO(S7)=0.00003183
EBO(S8)=0.00000019
and 8, calculating the steady-state availability of the server cluster. Obtaining the expected spare part shortage number EBO of the server cluster by summing the expected spare part shortage numbers of the various componentsq:
The steady state availability A of the server cluster is determined according to the functional relationship between the expected spare part shortage number of the server cluster and the steady state availabilityqCan be calculated from the following formula:
Claims (10)
1. a Markov modeling method for steady-state availability of a server cluster is characterized by comprising the following steps: the method comprises the following steps:
step 1, determining relevant parameters of a server cluster according to the architectural characteristics of the server cluster;
step 2, selecting parameters for describing the state of the component, and defining various states of the component;
step 3, determining the transfer rate among the states of the component, and constructing a Continuous Time Markov Chain (CTMC) of the component;
step 4, calculating a transfer rate matrix of the component;
step 5, calculating the steady-state probability of each state of the component by solving a specific non-homogeneous linear equation set;
step 6, calculating the expected spare part shortage number of the component;
step 7, respectively executing the step 2 to the step 6 on each type of component, combining CTMCs of each type of component, constructing a CTMC family model of the server cluster, and calculating the expected spare part shortage number of the server cluster;
step 8, calculating the steady-state availability of the server cluster;
through the steps, under the condition of considering spare parts, firstly establishing CTMC models of various components, and then establishing CTMC family models of a server cluster by combining the CTMCs of various components according to the mutually independent relationship of the faults and maintenance among the components; and then calculating the expected spare part shortage number of the server cluster according to the CTMC family model, and calculating the steady-state availability of the server cluster by using the relation between the expected spare part shortage number of the server cluster and the steady-state availability.
2. The method of claim 1, wherein the Markov model for steady-state availability of the server cluster is as follows: the determining of the relevant parameters of the server cluster in step 1 refers to determining the number N of servers, the number L of key components of the servers, the failure rate and the repair rate of each component, and the number Z of single server installations included in the server clusteri。
3. The method of claim 2, wherein the Markov model for steady-state availability of the server cluster is as follows: the components in the steps 2 to 7 refer to specific ith-type key components, i is taken from any number of 1,2, … and L, namely i belongs to {1,2, … and L }.
4. The Markov modeling method for steady-state availability of the server cluster as recited in claim 3, wherein: selecting the parameters describing the state of the component described in step 2 means using three parameters: number of available servers O affected by component iiInventory G of Components iiNumber of shortage of spare parts BOiThereby determining a state space of the component; at the initial moment, the inventory quantity S of the component ii;
The "defining various states of the component" described in step 2 means that the specific defining method is as follows for various states of the component i:
state 0: (N, S)i0), N number of available servers, SiThe number of stocks, 0 spare part shortage number;
state 1: (N, S)i-1,0) representing the number of N available servers, Si-1 stock count, 0 spare part shortage count;
state Si: (N,0,0) indicating the number of N available servers, the number of 0 stocks and the number of 0 spare part shortage;
state Si+1: (N-1,0,1) representing the number of N-1 available servers, 0 inventory number and 1 spare part shortage number;
state Si+ k: (N-k,0, k) indicating the number of N-k available servers, the number of 0 stock, and the number of k spare part shortages;
state Si+ N: (0,0, N), 0 number of available servers, 0 number of inventory, N number of spare part shortages;
so that component i has Ei=Si+ N +1 states, the numbers of the states being 0,1,2, …, Si+N。
5. The Markov modeling method for steady-state availability of the server cluster as recited in claim 4, wherein: described in step 3Determining the transition rate of each state of the component, namely determining the Markov process transition rate of the component i according to the relationship between the states of the component i; lambda [ alpha ]iAnd muiRespectively representing the failure rate and the repair rate of the component i;
when spare parts of component i are sufficient: if one component i fails, the failed component is replaced by using spare parts, the number of the spare parts is reduced by 1, the number of available servers is still N, and the component i is in a slave state (N, S)i-q,0) to the adjacent state (N, S)iQ-1,0) with a transfer rate NZiλi(ii) a When 1 component i is restored and then is returned to the spare part library, the number of spare parts of the component i is increased by 1, and the component i is in the slave state (N, S)i-q-1,0) to the adjacent state (N, S)iQ,0) with a transfer rate of (q +1) mui(ii) a Wherein q is more than or equal to 0 and less than or equal to Si-1;
When a shortage of spare parts of component i occurs: if a component i fails, it will cause a server to suspend operation, i.e. the number of available servers OiReduced by 1, spare part shortage BOiIncrement by 1, component i will transition from state (N-k,0, k) to state (N-k-1,0, k +1) at a transition rate of (N-k) Ziλi(ii) a When a failed component i is repaired, a server can be recovered to operate normally, namely the number of available servers OiWill increase by 1, the spare part shortage number BOiDecrease 1, component i will transition from state (N-k-1,0, k +1) to state (N-k,0, k), with a transition rate of (S)i+k+1)μi(ii) a Wherein k is more than or equal to 1 and less than or equal to N-1;
the state (N,0,0) and the state (N-1,0,1) are two states of whether or not the connection spare i is in shortage; the analysis can know that if a failure occurs while the component i is in the state (N,0,0), the number of servers O can be usediWill reduce 1, spare part shortage BOiThe transition rate from 0 to 1 and from the state (N,0,0) to the state (N-1,0,1) is NZiλi(ii) a When a component i is in the state (N-1,0,1), if a failed component i is repaired, the number of available servers O is enablediIncrease 1, spare parts shortage BOiTransition rate from 1 to 0, transition from state (N-1,0,1) to state (N,0,0)Is (S)i+1)μi;
The initial state of component i is (N, S)i0), the transfer rate matrix is recorded asAiIs ani×EiWherein E isiIs the number of states of component i, representing the matrix AiThe dimension of (a); when m ≠ nRepresents the rate of transition from state m to state n, when m is equal to nDenotes the m and n line divisionsThe opposite number of the sum of the other elements; m and n are numbers of the states of the parts only, 0. ltoreq. m, n. ltoreq. Si+ N; this forms a set of continuous-time Markov chains { X over component ii(t)=(Oi(t),Gi(t),BOi(t)); t is more than or equal to 0}, which describes the state transition process of the component i, and the transition rate among the states of the component i is shown in table 1;
TABLE 1 State transition Rate of component i
6. The Markov modeling method for steady-state availability of the server cluster as recited in claim 5, wherein: the transfer rate matrix of the computation unit described in step 4 is based on the continuous-time Markov chain { X } established in step 3i(t)=(Oi(t),Gi(t),BOi(t)); t ≧ 0} for determining the transfer rate matrix for part i is as follows:
7. the Markov modeling method for steady-state availability of the server cluster as recited in claim 6, wherein: the steady-state probabilities for the states of the computation component described in step 5 are calculated as follows: the steady state probability matrix of each state of component i is recorded as pii,Solving the stationary state probability pi from the following non-homogeneous linear equation seti;
Here, AiA transfer rate matrix representing component i; 0 represents 1 × EiA zero matrix of dimensions; pii,jRepresents the steady state probability that the component i is in the state j, j is more than or equal to 0 and less than or equal to Si+N;
After solving the non-homogeneous linear equation set, the steady-state probability of the state p can be obtained as follows:
wherein p is more than or equal to 1 and less than or equal to Si+1;
State SiThe steady state probability of + h is:
wherein h is more than or equal to 2 and less than or equal to N;
wherein w is more than or equal to 2 and less than or equal to N.
8. The method of claim 7, wherein the Markov model for steady-state availability of the server cluster is as follows: the expected spare part shortage number of the component is calculated as described in step 6 by the following method: after solving the steady-state probabilities of the states of the component i, the expected spare part shortage number of the component i is calculated by the following formula:
here, EBO (S)i) Refers to the expected spare part shortage number for component i in steady state; biMeans a spare part shortage number vector for component i; bi,mIs biElement (b), spare part shortage number when finger i is in state m; biThe specific form of (A) is as follows:
9. the method of claim 8, wherein the Markov model for steady-state availability of the server cluster is as follows: the step 7 of performing the steps 2 to 6 on each type of component, combining the CTMCs of the various types of components, and constructing the CTMC family model of the server cluster means that Continuous Time Markov Chains (CTMCs) of the components are built one by one, namely, { X }i(t)=(Oi(t),Gi(t),BOi(t)); t is more than or equal to 0, i is 1,2, …, L, and a comprehensive server cluster CTMC family { X (t) } X can be established according to the independent relation of the fault and maintenance of various components1(t),…,Xi(t),…,XL(t) }, t is more than or equal to 0} model; calculating the expected spare part shortage number of the server cluster in step 7, the calculation method is as follows: EBO of a Server Cluster under steady StateqIs equivalent to the causeThe sum of expected values for the number of unavailable servers due to a shortage of spare parts, and thus the EBOqThe calculation formula of (a) is as follows:
10. the method of claim 9, wherein the markov modeling for steady state availability of the server cluster comprises: the "calculating the steady-state availability of the server cluster" in step 8 is calculated as follows: total number of servers N and EBO of server cluster in steady stateqSubtracting to obtain the expected value of the number of the servers; while the steady state availability A of the server clusterqEquivalent to the percentage of the expected value of the number of the energy servers to the total number of the servers N; hence server cluster AqThe calculation formula of (a) is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710867338.0A CN107783851B (en) | 2017-09-22 | 2017-09-22 | Markov modeling method for steady-state availability of server cluster |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710867338.0A CN107783851B (en) | 2017-09-22 | 2017-09-22 | Markov modeling method for steady-state availability of server cluster |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107783851A CN107783851A (en) | 2018-03-09 |
CN107783851B true CN107783851B (en) | 2020-06-02 |
Family
ID=61433563
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710867338.0A Active CN107783851B (en) | 2017-09-22 | 2017-09-22 | Markov modeling method for steady-state availability of server cluster |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107783851B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110147306B (en) * | 2019-04-26 | 2020-12-15 | 北京航空航天大学 | Fault-tolerant software reliability and performance evaluation method considering correlation failure |
CN111460363A (en) * | 2020-04-01 | 2020-07-28 | 丰车(上海)信息技术有限公司 | Second-hand vehicle supply chain site selection and inventory level management algorithm |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103729693A (en) * | 2013-12-23 | 2014-04-16 | 清华大学 | Maintenance and spare part supply combined optimization method based on deterministic inventory degradation model |
CN105825045A (en) * | 2016-03-11 | 2016-08-03 | 西北工业大学 | Repairable spare part demand prediction method for phased-mission system |
CN106886822A (en) * | 2017-01-18 | 2017-06-23 | 西北工业大学 | The polymorphic series connection of oriented mission can repair equipment weak element recognition methods |
CN106919984A (en) * | 2017-02-22 | 2017-07-04 | 西北工业大学 | Parallel system Repairable Unit repair determining method based on cost |
WO2017093560A9 (en) * | 2015-12-03 | 2017-09-21 | Electricite De France | Estimating the reliability of an industrial system |
-
2017
- 2017-09-22 CN CN201710867338.0A patent/CN107783851B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103729693A (en) * | 2013-12-23 | 2014-04-16 | 清华大学 | Maintenance and spare part supply combined optimization method based on deterministic inventory degradation model |
WO2017093560A9 (en) * | 2015-12-03 | 2017-09-21 | Electricite De France | Estimating the reliability of an industrial system |
CN105825045A (en) * | 2016-03-11 | 2016-08-03 | 西北工业大学 | Repairable spare part demand prediction method for phased-mission system |
CN106886822A (en) * | 2017-01-18 | 2017-06-23 | 西北工业大学 | The polymorphic series connection of oriented mission can repair equipment weak element recognition methods |
CN106919984A (en) * | 2017-02-22 | 2017-07-04 | 西北工业大学 | Parallel system Repairable Unit repair determining method based on cost |
Non-Patent Citations (1)
Title |
---|
考虑休眠的两部件***可用度马氏建模方法;耿岩 等;《仪器仪表学报》;20160930;第37卷(第9期);第1996-2003页 * |
Also Published As
Publication number | Publication date |
---|---|
CN107783851A (en) | 2018-03-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5294875B2 (en) | Automatic state estimation system for cluster devices and method of operating same | |
Lanus et al. | Hierarchical composition and aggregation of state-based availability and performability models | |
CN109656911A (en) | Distributed variable-frequencypump Database Systems and its data processing method | |
CN103970587B (en) | A kind of method, apparatus and system of scheduling of resource | |
CN107783851B (en) | Markov modeling method for steady-state availability of server cluster | |
CN110645153A (en) | Wind generating set fault diagnosis method and device and electronic equipment | |
CN111209301A (en) | Method and system for improving operation performance based on dependency tree splitting | |
CN107818418B (en) | Modeling method for time-varying inventory utilization rate and satisfaction rate of electronic equipment | |
Requeijo et al. | Six sigma business scorecard approach to support maintenance projects in a collaborative context | |
CN114510317A (en) | Virtual machine management method, device, equipment and storage medium | |
CN110261159B (en) | Fault diagnosis method for flexible manufacturing cutter subsystem | |
CN103593249B (en) | A kind of HA method for early warning and virtual resource manager | |
CN109359800B (en) | Evaluation method and system for running state of power distribution automation master station system | |
CN105786482A (en) | Artificial intelligence system | |
CN116070906A (en) | Risk identification and assessment method based on complex product supplier supply chain | |
CN111784229B (en) | Inventory configuration method of weapon system | |
CN114116122A (en) | High-availability load platform for application container | |
Hac | Using a software reliability model to design a telecommunications software architecture | |
CN105896534A (en) | Fault state set screening method for power transmission system considering importance degree and association degree of line | |
Li et al. | Research on Availability Evaluation of the Communication Uninterrupted Power Supply System | |
CN112131723A (en) | Markov theory-based energy management system reliability analysis method | |
Yuichi et al. | Orchestrator for Automating Failure Response in Telecom Carriers | |
Jena et al. | Fuzzy reliability analysis in interconnection networks | |
Sidorov et al. | Meta-monitoring system for ensuring a fault tolerance of the intelligent high-performance computing environment. | |
Ma et al. | Research on cutting quality prediction technology of aviation structural parts based on JAYA-GABP algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |