CN107844406A - Method for detecting abnormality and system, service terminal, the memory of distributed system - Google Patents

Method for detecting abnormality and system, service terminal, the memory of distributed system Download PDF

Info

Publication number
CN107844406A
CN107844406A CN201711017741.0A CN201711017741A CN107844406A CN 107844406 A CN107844406 A CN 107844406A CN 201711017741 A CN201711017741 A CN 201711017741A CN 107844406 A CN107844406 A CN 107844406A
Authority
CN
China
Prior art keywords
model
state
probability
observation sequence
time observation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711017741.0A
Other languages
Chinese (zh)
Inventor
万景琨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qianxun Position Network Co Ltd
Original Assignee
Qianxun Position Network Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qianxun Position Network Co Ltd filed Critical Qianxun Position Network Co Ltd
Priority to CN201711017741.0A priority Critical patent/CN107844406A/en
Publication of CN107844406A publication Critical patent/CN107844406A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3447Performance evaluation by modeling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3495Performance evaluation by tracing or monitoring for systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Complex Calculations (AREA)

Abstract

The present invention is applied to distributed system detection technique field, there is provided method for detecting abnormality and system, service terminal, the memory of a kind of distributed system, the method for detecting abnormality include:Gather the status data of distributed system;Create statistical model;Time observation sequence is introduced, establishes support probabilistic model of the statistical model to the time observation sequence;Based on abnormality detection result corresponding to the status data and the support probabilistic model acquisition.In the present invention, time observation sequence is introduced in statistical model, whether abnormal distributed system state in which is detected by supporting probabilistic model, improves abnormality detection accuracy.

Description

Method for detecting abnormality and system, service terminal, the memory of distributed system
Technical field
The invention belongs to distributed system detection technique field, more particularly to a kind of method for detecting abnormality of distributed system And system, service terminal, memory.
Background technology
In a distributed system, it can be physically adjacent or geographical that each computer, which is separate, Above disperse, they are attached by network or other modes, are formed a whole.
In order to preferably embody the ability that the powerful processing data of Distributed Calculation calculates, DCE is entered Row monitoring will become particularly important and crucial.System must coordinate the operation of these tasks, reasonable distribution resource obtains resource Sufficiently utilize and lift the performance of whole system.
Under normal circumstances, system manages these tasks using scheduler program.Various moneys in scheduler program meeting acquisition system The relevant information in source is to determine whether resource can use, and then dispatching algorithm is according to the availability of resource, run time of task etc. To determine the priority of task and distribute to their available resources.However as the operation of task, the state of various resources, such as Cpu load, free memory, hard disk remaining space etc. can change at any time, if before scheduling is carried out, with regard to money can be predicted Whether source still can use in some following time, and reasonably avoid use of the abnormal period to resource, then the scheduling of system As a result will be more preferable.Therefore, the resource in system is monitored, and be predicted before occurring extremely with important Meaning.
In the prior art, predicting abnormality model all uses the model based on regression technique, and regression technique has its specific Limitation;Or learn forecast model using conventional machines and be predicted, but can be due in real-time system Treatment Analysis Algorithm complex is high, and system is unstable to cause to generate that result is slow, and prediction result reliability is not strong.
The content of the invention
The embodiments of the invention provide a kind of method for detecting abnormality of distributed system and system, service terminal, memory, Aim to solve the problem that the problem of abnormality detection accuracy is relatively low in the prior art.
The embodiment of the present invention is achieved in that a kind of method for detecting abnormality of distributed system, including:
Gather the status data of distributed system;
Create statistical model;
Time observation sequence is introduced, establishes support probabilistic model of the statistical model to the time observation sequence;
Based on abnormality detection result corresponding to the status data and the support probabilistic model acquisition.
Preferably, the statistical model is Markov model, and the statistical model is specially:λ=(S, P, Q), wherein, The S is reset condition space, and the P is state transition probability matrix, and the Q is stateful for known distributed system Probabilistic model is distributed.
Preferably, the introducing time observation sequence, establishes support of the statistical model to the time observation sequence Probabilistic model includes:
Introduce the time observation sequence;
Optimization is iterated to the reset condition space according to the Markov model, obtains Optimal State space.
Preferably, it is described to be optimized based on reset condition space described in the time observation sequence pair, obtain optimizing shape State space specifically includes:
Obtain the number that each state occurs;
Calculate the probability that each state occurs;
The probability occurred to each state calculated is ranked up by size;
The state number of the state space is obtained based on ranking results;
The time observation sequence is added to the reset condition space, obtains the Optimal State space.
Abnormality detection result corresponding to being preferably based on the status data and the support probabilistic model acquisition is specifically wrapped Include:
The time observation sequence is divided using window partitioning, obtains two or more status switch;
Based on abnormality detection result corresponding to the status data and the acquisition of described two above status switches.
Preferably, it is described based on abnormality detection knot corresponding to the status data and the acquisition of described two above status switches Fruit specifically includes:
The mean value model for supporting probability is obtained based on described two above status switches;
Corresponding support probability is calculated based on the status data and the mean value model;
Based on abnormality detection result corresponding to the support probability acquisition of calculating gained.
Preferably, it is described based on described two above status switches obtain support probability mean value model after, be based on The status data and the mean value model are supported also to include before probability corresponding to calculating:
The mean value model is modified, obtains the mean value model of amendment;
It is described to be specially based on support probability corresponding to the status data and mean value model calculating:
Corresponding support probability is calculated based on the status data and the mean value model of the amendment.
The present invention also provides a kind of abnormality detection system of distributed system, including:
Acquisition module, for gathering the status data of distributed system;
Creation module, for creating statistical model;
Module is established, for introducing time observation sequence, establishes branch of the statistical model to the time observation sequence Hold probabilistic model;
Acquisition module, for based on abnormality detection knot corresponding to the status data and the support probabilistic model acquisition Fruit.
The present invention also provides a kind of memory, and the memory storage has computer program, and the computer program is located Manage device and perform following steps:
Gather the status data of distributed system;
Create statistical model;
Time observation sequence is introduced, establishes support probabilistic model of the statistical model to the time observation sequence;
Based on abnormality detection result corresponding to the status data and the support probabilistic model acquisition.
The present invention also provides a kind of car terminals, including memory, processor and is stored in the memory and can be The computer program run on the processor, following steps are realized during computer program described in the computing device:
Gather the status data of distributed system;
Create statistical model;
Time observation sequence is introduced, establishes support probabilistic model of the statistical model to the time observation sequence;
Based on abnormality detection result corresponding to the status data and the support probabilistic model acquisition.
In embodiments of the present invention, time observation sequence is introduced in statistical model, is detected by supporting probabilistic model Whether distributed system state in which is abnormal, improves abnormality detection accuracy.
Brief description of the drawings
Fig. 1 is a kind of flow chart of the method for detecting abnormality for distributed system that first embodiment of the invention provides;
Fig. 2 is a kind of the specific of the step S3 of the method for detecting abnormality for distributed system that first embodiment of the invention provides Flow chart;
Fig. 3 is a kind of step S32 of the method for detecting abnormality for distributed system that first embodiment of the invention provides tool Body flow chart;
Fig. 4 is a kind of the specific of the step S4 of the method for detecting abnormality for distributed system that first embodiment of the invention provides Flow chart;
Fig. 5 is a kind of step S42 of the method for detecting abnormality for distributed system that first embodiment of the invention provides tool Body flow chart;
Fig. 6 is a kind of structure chart of the abnormality detection system for distributed system that second embodiment of the invention provides;
Fig. 7 is a kind of structure chart for service terminal that third embodiment of the invention provides.
Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.
In the embodiment of the present invention, a kind of method for detecting abnormality of distributed system, including:Gather the state of distributed system Data;Create statistical model;Time observation sequence is introduced, it is general to establish support of the statistical model to the time observation sequence Rate model;Based on abnormality detection result corresponding to the status data and the support probabilistic model acquisition.
In order to illustrate technical solutions according to the invention, illustrated below by specific embodiment.
Embodiment one:
Fig. 1 shows a kind of flow chart of the method for detecting abnormality for distributed system that first embodiment of the invention provides, Including:
Step S1, gather the status data of distributed system;
Specifically, the status data of distributed system is gathered, the status data may include:It is current time node, residing State, this can also be restricted herein including service data etc..
Step S2, create statistical model;
Specifically, a state is assumed initially that, i.e. state of the probability distribution of t+1 moment distributed systems only with t has Close, it is unrelated with the state before t, shifted from t to the state at t+1 moment unrelated with t value.Create statistical model, For the abnormality detection of the distributed system, the statistical model is preferably Markov model, it is further preferred that using static Markov model;
In the present embodiment, the static Markov model is specially:λ=(S, P, Q), wherein, the S is that reset condition is empty Between, the P is state transition probability matrix, and the Q is the stateful distribution probability model of known distributed system, the institute It is stateful including abnormality and normal condition.
In a preferred scheme of the present embodiment, S is reset condition space, and all possible shape of distributed system The state set for the non-NULL that state is formed, it can be set that is limited, can arranging or any nonempty set, can use SiRepresent (state space i.e. under state i);P is state transition probability matrix (M*M), and M is the state number of distributed system;For Any i ∈ S, meet:Wherein, PijRepresent that distributed system is in state i in moment t, in lower a period of time The probability that t+1 is in state j is carved, the M is real number.
All state probability models (normal and abnormality) distribution of distributed system known to the Q expressions, and Q= [q1 q2 q3 … qM], wherein, qiIt is the probability that distributed system is in state i in initial time, meets:The qiIt is the probability that distributed system is in state j in initial time.
Step S3, time observation sequence is introduced, establishes support probabilistic model of the statistical model to time observation sequence;
Specifically, it is assumed that a time observation sequence, the time observation sequence are:S=(S1, S2..St..., ST), its In, StDistributed system is represented in moment t state in which, based on the time observation sequence, thus it is speculated that obtained support probability mould Type is:The T represents last timestamp in the time observation sequence, Maximum time stamp i.e. in time observation sequence.
Step S4, based on status data and support abnormality detection result corresponding to probabilistic model acquisition;
Specifically, calculated based on above-mentioned support probabilistic model and status data and support probability, speculated point by support probability Whether cloth system currently there is exception, testing result corresponding to acquisition.
By calculating state transition probability to a large amount of normal sequence datas, known Each point in time state in which Matrix and initial probability distribution, and then support probability is calculated, support probability is more big then to represent that distributed system is in normal shape The possibility of state is bigger.
In the present embodiment, time observation sequence is introduced in statistical model, distribution is detected by supporting probabilistic model Whether system state in which is abnormal, improves abnormality detection accuracy.
In a preferred scheme of the present embodiment, as shown in Fig. 2 a kind of distribution provided for first embodiment of the invention The step S3 of the method for detecting abnormality of formula system particular flow sheet, step S3 are specifically included:
Step S31, introduce time observation sequence;
Specifically, it is to obtain all state skies between distributed space by the estimation of historical data to introduce time observation sequence Between and each state probability of happening
Step S32, optimization is iterated to reset condition space according to static Markov model, obtains Optimal State sky Between;
And specifically, it is preferable to which the model is static Markov model is iterated optimization to reset condition space, obtain Optimal State space.
In a preferred scheme of the present embodiment, as shown in figure 3, a kind of distribution provided for first embodiment of the invention The step S32 of the method for detecting abnormality of formula system particular flow sheet, step S32 are specifically included:
Step S321, obtain the number that each state occurs;
Specifically, the normal behaviour sequence of distributed system is scanned, records the number of each state appearance, such as shape The number that state i occurs is Ni
Step S322, calculate the probability that each state occurs;
Specifically, the probability that each state occurs is calculated, state S is calculated according to following formulaiProbability P (the S of appearancei), The formula is:P(Si)=Ni/ N, wherein, N represents the occurrence number summation of all system modes, NiTime that expression state i occurs Number summation.
Step S323, the probability occurred to each state calculated are ranked up by size;
Specifically, the probability of the appearance to being calculated is ranked up, and can be ranked up by numerical value in the way of from big to small, It can also be ranked up according to numerical value, ascending mode, this is not restricted herein.Preferably, with descending side Formula is ranked up, such as:P(S1)=17%, P (S2)=5%, P (S3)=10.4%, P (S4)=0.4%, P (S5)=0.5%;
Step S324, the state number of state space is obtained based on ranking results;
Specifically, using formulaAnd the probability for combining the appearance of above-mentioned sequence obtains shape The state number k of state space.
Further, by time observation sequence S1, S2..., Sk-1It is added in state space S, by other all states It is combined and is added to as state k in above-mentioned state space, the state number of the state space is k.
In the present embodiment, use aforesaid way that other all status mergings are special (with above k-1 state for one Different type) state k, without individually going to consider other each states, emphasis considers state corresponding to time observation sequence, simplifies Statistical model.
Step S325, time observation sequence is added to reset condition space, obtains Optimal State space;
Based on above-mentioned, initial probability distribution model Q=[q1 q2 … qk] in, i.e., as i < k, qi=P (Si), and it is full Foot:Now state transition probability pij(i, j ∈ S) calculates according to below equation:pij=Nij/Ni, pijRepresent that distributed system is in state i in moment t, state j probability is in subsequent time t+1.
It should be noted that during the analysis of reality, if some state is not belonging in S in time observation sequence Preceding K-1 state, then using the state as special state SkIt is added in state space S.
Specifically, behavioral implications degree of the historical behavior of distributed system to current time is relevant with time interval, more Influence of the long information to distributed system current behavior is with regard to smaller, it is preferable that introduces forgetting factor and comes to initial probability distribution Just correct, with approaching to reality value, for example, after historic state I to historic state m is observed, wherein historic state occurs Time is later than historic state m, then increases q before and after occurring according to the time1, and accordingly reduce qi, (wherein q1And qmExpression observes Certain two historic state), and ensure:Now, m is not equal to l, and 1 < m≤k.I.e. except to historic state l Counted with m probability, also to consider the attenuation degree that historic state l and m influences on current time state.
Further, initial probability distribution is modified using following correction algorithms, the correction algorithm is specially:
Wherein:The r is forgetting factor (r > 0), t=1,2 ..., m, and the 1 < m≤k, l represent a historical events The state of observation, i represent the state that current point in time is occurred, and can be clearly showed that by above-mentioned formula and work as history when observing When event l is exactly current generating state event (l is equal to i) and (l is not equal to i) event l is at any time when not being current state event Between decay function, r concrete numerical value can set according to actual conditions, this is not restricted herein.Finally pass through iteration, gained To q be the probabilistic model in predicted time of each state, wherein it is to specific to extract the big Exception Model of probability out The prediction that compartment system occurs extremely.
In a preferred scheme of the present embodiment, as shown in figure 4, a kind of distribution provided for first embodiment of the invention The step S4 of the method for detecting abnormality of formula system particular flow sheet, step S4 are specifically included:
Step S41, time observation sequence is divided using window partitioning, obtains two or more status switch;
Specifically, from the point of view of computation complexity and iteration performance, window partitioning division time observation can be used Sequence pair S2 acquired results further optimize, and obtain two or more status switch;
Because the time range of the distributed system of research diagnosis may be very long, and in view of the distribution of high active state Formula system, if counting calculating iteration to the state of each timestamp to predict some some following timestamp State status has little significance and to calculate time and space cost very high.It is by predicting that some following period is under normal circumstances It is no that abnormality situation occurs to be diagnosed and be predicted;Correspondingly, original state can be stabbed by some historical time Probability distribution cutting for set time window state probability distribution.
Further, observation sequence length is continuously increased over time, and the probable value calculated can be less and less, so It is difficult to judge whether observation sequence is abnormal according to the size of this probability, and only in the time observation sequence of equal length On the basis of it is just significant to compare, therefore time observation sequence is divided using window partitioning, the use of size is w's Window divides status switch, obtains a series of status switch that length are w.Wherein, the size of the w according to actual conditions and If this is not restricted herein.
Step S42, based on abnormality detection result corresponding to status data and the acquisition of two or more status switch;
Specifically, supported according to corresponding to foregoing status data and two or more status switch calculate status switch general Rate, according to supporting probability to judge whether distributed system exception occurs, obtain corresponding testing result.
In a preferred scheme of the present embodiment, as shown in figure 5, a kind of distribution provided for first embodiment of the invention The step S42 of the method for detecting abnormality of formula system particular flow sheet, step S42 are specifically included:
Step S421, the mean value model for supporting probability is obtained based on two or more status switch;
Specifically, the observation sequence for setting state length to be L (L > w) passes through sliding window, calculates each division window respectively Probability is supported corresponding to mouthful, and then the mean value model for trying to achieve the support probability obtained by all calculating is as follows:
Step S422, corresponding support probability is calculated based on status data and mean value model;
Specifically, based on status data, support probability average can be calculated by the model.
Step S423, based on abnormality detection result corresponding to the support probability acquisition of calculating gained;
Specifically, it is calculated after the average value for supporting probability, can be speculated point according to the average value of the support probability Whether cloth system currently there is exception, obtains corresponding testing result.
In a preferred scheme of the present embodiment, it may also include after step S421, before step S422:
Step S424, is modified to mean value model, obtains the mean value model of amendment;
Specifically, in order to observation sequence that computational length is L (L is real number) support probability, it is necessary to record the sequence Institute it is stateful, i.e., to remember distributed system behavior over state, in order to improve efficiency, the mean value model is repaiied Just, the mean value model corrected:
Wherein, corresponding support probability average is in an initial condition:
In the present embodiment, step S422 is specially:Corresponding prop up is calculated based on status data and the mean value model of amendment Hold probability;
Specifically, the status data based on foregoing collection and revised mean value model calculate support probability, due to The revised status data for supporting that probability average model negligible interval time is long, and the status number that interval time is too long It is smaller according to being influenceed on current abnormal state detection, the accuracy of abnormality detection can be ensured while amount of calculation is reduced.
In the present embodiment, time observation sequence is introduced in statistical model, distribution is detected by supporting probabilistic model Whether system state in which is abnormal, improves abnormality detection accuracy.
Secondly, calculated using static Markov model, if by time observation sequence into Markov model, will Other states merge into a special state (such as less appearance), simplify statistical model structure, can improve computational efficiency.
Furthermore time observation sequence is divided using window dividing mode, can be counted on the basis of equal state sequence Calculate and support probability and be compared, improve the accuracy of abnormality detection.
Embodiment two:
Fig. 6 shows a kind of structure chart of the abnormality detection system for distributed system that second embodiment of the invention provides, The system includes:Acquisition module 1, the creation module 2 being connected with acquisition module 1, with the establishment of connection module 3 of creation module 2, with The acquisition module 4 of the connection of module 3 is established, wherein:
Acquisition module 1, for gathering the status data of distributed system;
Specifically, the status data of distributed system is gathered, the status data may include:It is current time node, residing State, this can also be restricted herein including service data etc..
Creation module 2, for creating statistical model;
Specifically, a state is assumed initially that, i.e. state of the probability distribution of t+1 moment distributed systems only with t has Close, it is unrelated with the state before t, shifted from t to the state at t+1 moment unrelated with t value.Create statistical model, For the abnormality detection of the distributed system, the statistical model is preferably Markov model, it is further preferred that using static Markov model;
In the present embodiment, the static Markov model is specially:λ=(S, P, Q), wherein, the S is that reset condition is empty Between, the P is state transition probability matrix, and the Q is the initial probability distribution model of distributed system, the institute it is stateful including Abnormality and normal condition.
In a preferred scheme of the present embodiment, S is reset condition space, and all possible shape of distributed system The state set for the non-NULL that state is formed, it can be set that is limited, can arranging or any nonempty set, can use SiRepresent (state space i.e. under state i);
P is state transition probability matrix (M*M), and M is the state number of distributed system;For any i ∈ S, meet:Wherein, PijRepresent that distributed system is in state i in moment t, state j is in subsequent time t+1 Probability.
All state probability models (normal and abnormality) distribution of distributed system known to the Q expressions, and Q= [q1 q2 q3 … qN], wherein, qiIt is the probability that distributed system is in state i in initial time, meets:The qiIt is the probability that distributed system is in state j in initial time;
Module 3 is established, for introducing time observation sequence, establishes support probability mould of the statistical model to time observation sequence Type;
Specifically, it is assumed that a time observation sequence, the time observation sequence are:S=(S1, S2..St..., ST), its In, StDistributed system is represented in moment t state in which, based on the time observation sequence, thus it is speculated that obtained support probability mould Type is:The T represents last timestamp in the time observation sequence, Maximum time stamp i.e. in time observation sequence.
Acquisition module 4, for based on abnormality detection result corresponding to status data and support probabilistic model acquisition;
Specifically, calculated based on above-mentioned support probabilistic model and status data and support probability, speculated point by support probability Whether cloth system currently there is exception, testing result corresponding to acquisition.
By calculating state transition probability to a large amount of normal sequence datas, known Each point in time state in which Matrix and initial probability distribution, and then support probability is calculated, support probability is more big then to represent that distributed system is in normal shape The possibility of state is bigger.
In the present embodiment, time observation sequence is introduced in statistical model, distribution is detected by supporting probabilistic model Whether system state in which is abnormal, improves abnormality detection accuracy.
In a preferred scheme of the present embodiment, this is established module 3 and specifically included:Introduce unit, connect with introducing unit The optimization unit connect, wherein:
Unit is introduced, for introducing time observation sequence;
Specifically, it is to obtain all state skies between distributed space by the estimation of historical data to introduce time observation sequence Between and each state probability of happening
Optimize unit, for being iterated optimization to reset condition space according to static Markov model, optimized State space;
And specifically, it is preferable to which the model is static Markov model is iterated optimization to reset condition space, obtain Optimal State space;
In a preferred scheme of the present embodiment, the optimization unit is specifically used for:
Obtain the number that each state occurs;
Specifically, the normal behaviour sequence of distributed system is scanned, records the times N of each state appearance for example The number that state i occurs is Ni
Calculate the probability that each state occurs;
Specifically, the probability that each state occurs is calculated, state S is calculated according to following formulaiProbability P (the S of appearancei), The formula is:P(Si)=Ni/ N, wherein, N represents the occurrence number summation of all system modes, NiTime that expression state i occurs Number summation.
The probability occurred to each state calculated is ranked up by size;
Specifically, the probability of the appearance to being calculated is ranked up, and can be ranked up by numerical value in the way of from big to small, It can also be ranked up according to numerical value, ascending mode, this is not restricted herein.Preferably, with descending side Formula is ranked up, such as:P(S1)=17%, P (S2)=5%, P (S3)=10.4%, P (S4)=0.4%, P (S5)=0.5%;
The state number of state space is obtained based on ranking results;
Specifically, using formulaAnd the probability for combining the appearance of above-mentioned sequence obtains shape The state number k of state space.
Further, by time observation sequence S1, S2..., Sk-1It is added in state space S, by other all states It is combined and is added to as state k in above-mentioned state space, the state number of the state space is k.
In the present embodiment, use aforesaid way by other all status mergings for a special state (k), without independent Go to consider other each states, state corresponding to emphasis consideration time observation sequence, simplify statistical model.
Time observation sequence is added to reset condition space, obtains Optimal State space;
Based on above-mentioned, initial probability distribution model Q=[q1 q2 … qk] in, i.e., as i < k, qi=P (Si), and it is full Foot:Now state transition probability pij(i, j ∈ S) calculates according to below equation:pij=Nij/Ni, pijRepresent that distributed system is in state i in moment t, state j probability is in subsequent time t+1.
It should be noted that during the analysis of reality, if some state is not belonging in S in time observation sequence Preceding K-1 state, then using the state as special state SkIt is added in state space S.
Specifically, behavioral implications degree of the historical behavior of distributed system to current time is relevant with time interval, more Influence of the long information to distributed system current behavior is with regard to smaller, it is preferable that introduces forgetting factor and comes to initial probability distribution Just correct, with approaching to reality value, for example, after historic state I to historic state m is observed, wherein historic state occurs Time is later than historic state m, then increases q before and after occurring according to the time1, and accordingly reduce qi, (wherein q1And qmExpression observes Certain two history generating state) and ensure:Now, m is not equal to l, and 1 < m≤k.I.e. except to historic state L and m probability is counted, and also to consider the attenuation degree that historic state 1 and m influence on current time state.
Further, initial probability distribution is modified using following correction algorithms, the correction algorithm is specially:
Wherein:The r is forgetting factor (r > 0), t=1,2 ..., m, and the l < m≤k, l represent a historical events The state of observation, i represent the state that current point in time is occurred, and can be clearly showed that by above-mentioned formula and work as history when observing When event I is exactly current generating state event (l is equal to i) and (l is not equal to i) event I is at any time when not being current state event Between decay function, the concrete numerical value of the r can set according to actual conditions, this is not restricted herein.Finally pass through iteration, Resulting q is the probabilistic model in predicted time of each state, wherein it is pair to extract the big Exception Model of probability out The prediction that specific distribution system exception occurs.
In a preferred scheme of the present embodiment, the acquisition module 4 specifically includes:Division unit and division unit connect The acquiring unit connect, wherein:
Division unit, for using window partitioning division time observation sequence, obtaining two or more status switch;
Specifically, from the point of view of computation complexity and iteration performance, window partitioning division time observation can be used The acquired results of sequence pair creation module 2 further optimize, and obtain two or more status switch;
Because the time range of the distributed system of research diagnosis may be very long, and in view of the distribution of high active state Formula system, if counting calculating iteration to the state of each timestamp to predict some some following timestamp State status has little significance and to calculate time and space cost very high.It is by predicting that some following period is under normal circumstances It is no that abnormality situation occurs to be diagnosed and be predicted;Correspondingly, original state can be stabbed by some historical time Probability distribution cutting for set time window state probability distribution.
Further, observation sequence length is continuously increased over time, and the probable value calculated can be less and less, so It is difficult to judge whether observation sequence is abnormal according to the size of this probability, and only in the time observation sequence of equal length On the basis of it is just significant to compare, therefore time observation sequence is divided using window partitioning, the use of size is w's Window divides status switch, obtains a series of status switch that length are w.Wherein, the size of the w according to actual conditions and If this is not restricted herein.
Acquiring unit, for based on abnormality detection result corresponding to status data and the acquisition of two or more status switch;
Specifically, supported according to corresponding to foregoing status data and two or more status switch calculate status switch general Rate, according to supporting probability to judge whether distributed system exception occurs, obtain corresponding testing result.
In a preferred scheme of the present embodiment, the acquiring unit is specifically used for:
The mean value model for supporting probability is obtained based on two or more status switch;
Specifically, the observation sequence for setting state length to be L (L > w) passes through sliding window, calculates each division window respectively Probability is supported corresponding to mouthful, and then the mean value model for trying to achieve the support probability obtained by all calculating is as follows:
Corresponding support probability is calculated based on status data and mean value model;
Specifically, based on status data, support probability average can be calculated by the model.
Based on abnormality detection result corresponding to the support probability acquisition of calculating gained;
Specifically, it is calculated after the average value for supporting probability, can be speculated point according to the average value of the support probability Whether cloth system currently there is exception, obtains corresponding testing result.
In a preferred scheme of the present embodiment, the acquiring unit is additionally operable to:Mean value model is modified, obtained The mean value model of amendment;
Specifically, in order to observation sequence that computational length is L (L is real number) support probability, it is necessary to record the sequence Institute it is stateful, i.e., to remember distributed system behavior over state, in order to improve efficiency, the mean value model is repaiied Just, the mean value model corrected:
Wherein, corresponding support probability average is in an initial condition:
Specifically, the status data based on foregoing collection and revised mean value model calculate support probability, due to The revised status data for supporting that probability average model negligible interval time is long, and the status number that interval time is too long It is smaller according to being influenceed on current abnormal state detection, the accuracy of abnormality detection can be ensured while amount of calculation is reduced.
In the present embodiment, time observation sequence is introduced in statistical model, distribution is detected by supporting probabilistic model Whether system state in which is abnormal, improves abnormality detection accuracy.
Secondly, calculated using static Markov model, if by time observation sequence into Markov model, will Other states merge into a special state (such as less appearance), simplify statistical model structure, can improve computational efficiency.
Furthermore time observation sequence is divided using window dividing mode, can be counted on the basis of equal state sequence Calculate and support probability and be compared, improve the accuracy of abnormality detection.
Embodiment three:
Fig. 7 shows a kind of structure chart for service terminal that third embodiment of the invention provides, and the service terminal includes:Deposit Reservoir (memory) 71, processor (processor) 72, communication interface (Communications Interface) 73 and bus 74, the processor 72, memory 71, communication interface 73 complete mutual interactive communication by bus 74, wherein:
Memory 71, for storing various data;
Specifically, memory 71 is used to store various data, such as data in communication process, data for receiving etc., this Place is not restricted to this, and the memory also includes multiple computer programs.
Communication interface 73, for the information transfer between the communication equipment of the service terminal;
Processor 72, for calling the various computer programs in memory 71, provided with performing above-described embodiment one A kind of distributed system method for detecting abnormality, such as:
Gather the status data of distributed system;
Create statistical model;
Time observation sequence is introduced, establishes support probabilistic model of the statistical model to the time observation sequence;
Based on abnormality detection result corresponding to the status data and the support probabilistic model acquisition.
In the present embodiment, time observation sequence is introduced in statistical model, distribution is detected by supporting probabilistic model Whether system state in which is abnormal, improves abnormality detection accuracy.
Secondly, calculated using static Markov model, if by time observation sequence into Markov model, will Other states merge into a special state (such as less appearance), simplify statistical model structure, can improve computational efficiency.
Furthermore time observation sequence is divided using window dividing mode, can be counted on the basis of equal state sequence Calculate and support probability and be compared, improve the accuracy of abnormality detection.
The present invention also provides a kind of memory, and the memory storage has multiple computer programs, the plurality of computer program The method for detecting abnormality for performing a kind of distributed system described in above-described embodiment one is called by processor.
In the present invention, time observation sequence is introduced in statistical model, distributed system is detected by supporting probabilistic model Whether state in which of uniting is abnormal, improves abnormality detection accuracy.
Secondly, calculated using static Markov model, if by time observation sequence into Markov model, will Other states merge into a special state (such as less appearance), simplify statistical model structure, can improve computational efficiency.
Furthermore time observation sequence is divided using window dividing mode, can be counted on the basis of equal state sequence Calculate and support probability and be compared, improve the accuracy of abnormality detection.
Those of ordinary skill in the art are it is to be appreciated that the list of each example described with reference to the embodiments described herein Member and algorithm steps, it can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually Performed with hardware or software mode, application-specific and design constraint depending on technical scheme.
Professional and technical personnel can realize described function using distinct methods to each specific application, still This realization is it is not considered that beyond the scope of this invention.The foregoing is only a specific embodiment of the invention, but the present invention Protection domain is not limited thereto, any one skilled in the art the invention discloses technical scope in, can Change or replacement are readily occurred in, should be all included within the scope of the present invention.Therefore, protection scope of the present invention should be described It is defined by scope of the claims.

Claims (10)

  1. A kind of 1. method for detecting abnormality of distributed system, it is characterised in that including:
    Gather the status data of distributed system;
    Create statistical model;
    Time observation sequence is introduced, establishes support probabilistic model of the statistical model to the time observation sequence;
    Based on abnormality detection result corresponding to the status data and the support probabilistic model acquisition.
  2. 2. method for detecting abnormality according to claim 1, it is characterised in that the statistical model is Markov model, The statistical model is specially:λ=(S, P, Q), wherein, the S is reset condition space, and the P is state transition probability square Battle array, the Q are all state probability model profiles of known distributed system.
  3. 3. method for detecting abnormality according to claim 2, it is characterised in that the introducing time observation sequence, establish institute State statistical model includes to the support probabilistic model of the time observation sequence:
    Introduce the time observation sequence;
    Optimization is iterated to the reset condition space according to the Markov model, obtains Optimal State space.
  4. 4. method for detecting abnormality according to claim 3, it is characterised in that described to be based on the time observation sequence pair institute State reset condition space to optimize, obtain Optimal State space and specifically include:
    Obtain the number that each state occurs;
    Calculate the probability that each state occurs;
    The probability occurred to each state calculated is ranked up by size;
    The state number of the state space is obtained based on ranking results;
    The time observation sequence is added to the reset condition space, obtains the Optimal State space.
  5. 5. method for detecting abnormality according to claim 4, it is characterised in that general based on the status data and the support Abnormality detection result corresponding to the acquisition of rate model specifically includes:
    The time observation sequence is divided using window partitioning, obtains two or more status switch;
    Based on abnormality detection result corresponding to the status data and the acquisition of described two above status switches.
  6. 6. method for detecting abnormality according to claim 5, it is characterised in that described to be based on the status data and described two Abnormality detection result corresponding to individual above status switch acquisition specifically includes:
    The mean value model for supporting probability is obtained based on described two above status switches;
    Corresponding support probability is calculated based on the status data and the mean value model;
    Based on abnormality detection result corresponding to the support probability acquisition of calculating gained.
  7. 7. method for detecting abnormality according to claim 6, it is characterised in that described to be based on described two above status switches Corresponding support generally is calculated after the mean value model of acquisition support probability, based on the status data and the mean value model Also include before rate:
    The mean value model is modified, obtains the mean value model of amendment;
    It is described to be specially based on support probability corresponding to the status data and mean value model calculating:
    Corresponding support probability is calculated based on the status data and the mean value model of the amendment.
  8. A kind of 8. abnormality detection system of distributed system, it is characterised in that including:
    Acquisition module, for gathering the status data of distributed system;
    Creation module, for creating statistical model;
    Module is established, for introducing time observation sequence, it is general to establish support of the statistical model to the time observation sequence Rate model;
    Acquisition module, for based on abnormality detection result corresponding to the status data and the support probabilistic model acquisition.
  9. 9. a kind of memory, the memory storage has computer program, it is characterised in that the computer program is by processor Perform following steps:
    Gather the status data of distributed system;
    Create statistical model;
    Time observation sequence is introduced, establishes support probabilistic model of the statistical model to the time observation sequence;
    Based on abnormality detection result corresponding to the status data and the support probabilistic model acquisition.
  10. 10. a kind of service terminal, including memory, processor and it is stored in the memory and can transports on the processor Capable computer program, it is characterised in that realize such as claim 1 to 7 times described in the computing device during computer program The step of method for detecting abnormality of distributed system described in meaning one.
CN201711017741.0A 2017-10-25 2017-10-25 Method for detecting abnormality and system, service terminal, the memory of distributed system Pending CN107844406A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711017741.0A CN107844406A (en) 2017-10-25 2017-10-25 Method for detecting abnormality and system, service terminal, the memory of distributed system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711017741.0A CN107844406A (en) 2017-10-25 2017-10-25 Method for detecting abnormality and system, service terminal, the memory of distributed system

Publications (1)

Publication Number Publication Date
CN107844406A true CN107844406A (en) 2018-03-27

Family

ID=61663031

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711017741.0A Pending CN107844406A (en) 2017-10-25 2017-10-25 Method for detecting abnormality and system, service terminal, the memory of distributed system

Country Status (1)

Country Link
CN (1) CN107844406A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109189827A (en) * 2018-08-16 2019-01-11 阿里巴巴集团控股有限公司 Time Series Processing method and apparatus, electronic equipment
CN109766229A (en) * 2018-12-05 2019-05-17 华东师范大学 A kind of method for detecting abnormality towards Integrated Electronic System
CN109857618A (en) * 2019-02-02 2019-06-07 中国银行股份有限公司 A kind of monitoring method, apparatus and system
CN112085866A (en) * 2020-08-14 2020-12-15 陕西千山航空电子有限责任公司 Airplane abnormal state identification method based on flight parameter data
CN115022908A (en) * 2022-05-11 2022-09-06 ***数智科技有限公司 Method for predicting and positioning abnormity of core network and base station transmission network

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102158372A (en) * 2011-04-14 2011-08-17 哈尔滨工程大学 Distributed system abnormity detection method
US20130226501A1 (en) * 2012-02-23 2013-08-29 Infosys Limited Systems and methods for predicting abnormal temperature of a server room using hidden markov model
CN104063747A (en) * 2014-06-26 2014-09-24 上海交通大学 Performance abnormality prediction method in distributed system and system
CN104699606A (en) * 2015-03-06 2015-06-10 国网四川省电力公司电力科学研究院 Method for predicting state of software system based on hidden Markov model
CN105511944A (en) * 2016-01-07 2016-04-20 上海海事大学 Anomaly detection method of internal virtual machine of cloud system
CN105843733A (en) * 2016-03-17 2016-08-10 北京邮电大学 Big data platform performance detection method and device
CN105933857A (en) * 2015-11-25 2016-09-07 ***股份有限公司 Mobile terminal position prediction method and apparatus
CN106612289A (en) * 2017-01-18 2017-05-03 中山大学 Network collaborative abnormality detection method based on SDN

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102158372A (en) * 2011-04-14 2011-08-17 哈尔滨工程大学 Distributed system abnormity detection method
US20130226501A1 (en) * 2012-02-23 2013-08-29 Infosys Limited Systems and methods for predicting abnormal temperature of a server room using hidden markov model
CN104063747A (en) * 2014-06-26 2014-09-24 上海交通大学 Performance abnormality prediction method in distributed system and system
CN104699606A (en) * 2015-03-06 2015-06-10 国网四川省电力公司电力科学研究院 Method for predicting state of software system based on hidden Markov model
CN105933857A (en) * 2015-11-25 2016-09-07 ***股份有限公司 Mobile terminal position prediction method and apparatus
CN105511944A (en) * 2016-01-07 2016-04-20 上海海事大学 Anomaly detection method of internal virtual machine of cloud system
CN105843733A (en) * 2016-03-17 2016-08-10 北京邮电大学 Big data platform performance detection method and device
CN106612289A (en) * 2017-01-18 2017-05-03 中山大学 Network collaborative abnormality detection method based on SDN

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109189827A (en) * 2018-08-16 2019-01-11 阿里巴巴集团控股有限公司 Time Series Processing method and apparatus, electronic equipment
CN109189827B (en) * 2018-08-16 2022-04-15 创新先进技术有限公司 Time sequence processing method and device and electronic equipment
CN109766229A (en) * 2018-12-05 2019-05-17 华东师范大学 A kind of method for detecting abnormality towards Integrated Electronic System
CN109766229B (en) * 2018-12-05 2022-02-11 华东师范大学 Anomaly detection method for integrated electronic system
CN109857618A (en) * 2019-02-02 2019-06-07 中国银行股份有限公司 A kind of monitoring method, apparatus and system
CN109857618B (en) * 2019-02-02 2022-07-08 中国银行股份有限公司 Monitoring method, device and system
CN112085866A (en) * 2020-08-14 2020-12-15 陕西千山航空电子有限责任公司 Airplane abnormal state identification method based on flight parameter data
CN112085866B (en) * 2020-08-14 2023-04-07 陕西千山航空电子有限责任公司 Airplane abnormal state identification method based on flight parameter data
CN115022908A (en) * 2022-05-11 2022-09-06 ***数智科技有限公司 Method for predicting and positioning abnormity of core network and base station transmission network
CN115022908B (en) * 2022-05-11 2023-05-12 ***数智科技有限公司 Method for predicting and positioning abnormality of core network and base station transmission network
WO2023216457A1 (en) * 2022-05-11 2023-11-16 ***数智科技有限公司 Method for predicting and positioning abnormity of transmission network between core network and base station

Similar Documents

Publication Publication Date Title
CN107844406A (en) Method for detecting abnormality and system, service terminal, the memory of distributed system
CN105468450B (en) Method for scheduling task and system
Wang et al. LDPA: A local data processing architecture in ambient assisted living communications
Yang et al. A time efficient approach for detecting errors in big sensor data on cloud
JP5948257B2 (en) Information processing system monitoring apparatus, monitoring method, and monitoring program
CN104065741A (en) Data collection system and method
US20100281482A1 (en) Application efficiency engine
US9858106B2 (en) Virtual machine capacity planning
CN113342510B (en) Water and power basin emergency command cloud-side computing resource cooperative processing method
CN102223453A (en) High performance queueless contact center
CN113516244B (en) Intelligent operation and maintenance method and device, electronic equipment and storage medium
CN102300011A (en) Automated mechanism for populating and maintaining data structures in queueless contact center
Araujo et al. Dependability evaluation of a mhealth system using a mobile cloud infrastructure
CN112202617B (en) Resource management system monitoring method, device, computer equipment and storage medium
CN112783720A (en) Topological structure diagram generation method and device, computer equipment and display system
CN109840141A (en) Thread control method, device, electronic equipment and storage medium based on cloud monitoring
WO2020206699A1 (en) Predicting virtual machine allocation failures on server node clusters
CN113242304B (en) Edge side multi-energy data acquisition scheduling control method, device, equipment and medium
CN112051771B (en) Multi-cloud data acquisition method and device, computer equipment and storage medium
Poghosyan et al. Managing cloud infrastructures by a multi-layer data analytics
CN117596247A (en) Resource monitoring and performance evaluation method based on heterogeneous edge computing system
CN116627771B (en) Log acquisition method, device, electronic equipment and readable storage medium
CN113158435A (en) Complex system simulation running time prediction method and device based on ensemble learning
Mahato et al. Reliability modeling and analysis for deadline-constrained grid service
CN112988904A (en) Distributed data management system and data storage method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180327

RJ01 Rejection of invention patent application after publication