CN104156562A - Failure predication system and failure predication method for background operation and maintenance system of bank - Google Patents

Failure predication system and failure predication method for background operation and maintenance system of bank Download PDF

Info

Publication number
CN104156562A
CN104156562A CN201410337349.4A CN201410337349A CN104156562A CN 104156562 A CN104156562 A CN 104156562A CN 201410337349 A CN201410337349 A CN 201410337349A CN 104156562 A CN104156562 A CN 104156562A
Authority
CN
China
Prior art keywords
data
bank
backstage
failure prediction
final characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410337349.4A
Other languages
Chinese (zh)
Inventor
徐华
李晓潇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201410337349.4A priority Critical patent/CN104156562A/en
Publication of CN104156562A publication Critical patent/CN104156562A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a failure predication method for a background operation and maintenance system of a bank. The method includes the steps: acquiring a bank data sample and extracting initial feature data from the bank data sample; subjecting the initial feature data to dimensionality reduction to obtain final feature data; using a random forest method for classified cleaning of the final feature data so as to carry out failure predication on the background operation and maintenance system of the bank according to classification results; when failures of the background operation and maintenance system of the bank is predicated, adjusting parameters of the background operation and maintenance system of the bank according to the final feature data to avoid failures of the background operation and maintenance system of the bank or lower failure occurrence probability of the background operation and maintenance system of the bank. According to the method, failures of the background operation and maintenance system of the bank can be effectively predicated, and the failures can be avoided or the failure occurrence probability can be lowered effectively by precaution. The invention further provides a failure predication system for the background operation and maintenance system of the bank.

Description

Failure prediction method and the system of bank's backstage O&M system
Technical field
The present invention relates to Computer Applied Technology field, particularly failure prediction method and the system of a kind of bank backstage O&M system.
Background technology
Bank is as the financial institution of nomocracy that manages money and credit business, because its characteristic such as safe and efficient is extensively by people are used.As an important system like this, it is particularly important that its security and high efficiency just seem, the wherein security lifeblood of banking system especially, but even so, the large-scale fault in bank aspect still happens occasionally.The fault of scale is not often to be caused by the work mistake on foreground like this, because the generation that the thorough transaction step in bank foreground almost can be stopped human error, even and if error generation is also the small-scale mistake of one or two transaction.These faults of concluding the business are all often to be caused by the fault of the system on backstage.Therefore, want more effectively to avoid the generation of bank's fault, should show great attention to the O&M of background system.But bank's background system is often very complicated, cause the overtime reason of fault transaction varied especially, may comprise: the linked network between bank, the database of rear end record data, produces fault for moving server of transaction program etc.And one of them fault tends to cause a series of chain reaction, such as, in the time that paralysis occurs database, all transaction request will start to pile up, thereby cause the inadequate resource of server; On the contrary, leak if the internal memory of server produces, system resource so gradually can be fewer and feweri, thereby cause the operation resource requirement deficiency of database, finally paralysis.As can be seen here, the system correlativity on backstage is quite complicated, want by rule and method Direct Analysis be out of order the reason that produces hardly may, thereby when unpredictable bank backstage O&M system can break down.
Summary of the invention
The present invention is intended to solve at least to a certain extent one of technical matters in above-mentioned correlation technique.
For this reason, one object of the present invention is to propose the failure prediction method of a kind of bank backstage O&M system, and the method can be predicted the fault of bank's backstage O&M system effectively, and can avoid or reduce by effective strick precaution the probability that fault occurs.
Another object of the present invention is to provide the failure prediction system of a kind of bank backstage O&M system.
To achieve these goals, the embodiment of first aspect present invention has proposed the failure prediction method of a kind of bank backstage O&M system, comprises the following steps: obtain bank data sample, and extract initial characteristics data from described bank data sample; Described initial characteristics data are carried out to dimensionality reduction to obtain final characteristic; Utilize random forest method to carry out classification learning to described final characteristic, to described bank backstage O&M system is carried out to failure prediction according to classification results; In the time predicting described bank backstage O&M system and will break down, the parameter of adjusting described bank backstage O&M system according to described final characteristic is to avoid described bank backstage O&M system to break down or to reduce the fault rate of described bank backstage O&M system.
According to the failure prediction method of bank's backstage O&M system of the embodiment of the present invention, raw data to bank server extracts, pre-service, Feature Dimension Reduction processing, use afterwards random forest method to carry out model training, for the predicted data newly arriving, the model that use trains is tested, and provide predicting the outcome of fault rate, finally the higher sample of fault rate is provided to parameter adjustment scheme, reduce thereby control failure rate.Therefore, the method can be predicted the fault of bank's backstage O&M system effectively, and can avoid or reduce by effective strick precaution the probability that fault occurs.
In addition, the failure prediction method of bank according to the above embodiment of the present invention backstage O&M system can also have following additional technical characterictic:
In some instances, described initial characteristics data comprise: time out fault data, database service data, network operation data, hard disk service data and CPU service data.
In some instances, the described bank data sample that obtains, and extract initial characteristics data from described bank data sample, specifically comprise to: described bank data sample is carried out duplicate removal, deletes superfluous, time discretization alignment and data mark, to extract initial characteristics data.
In some instances, by progressively forward Method for Feature Selection described initial characteristics data are carried out to dimensionality reduction to obtain final characteristic.
In some instances, the parameter of adjusting described bank backstage O&M system according to described final characteristic, to avoid described bank backstage O&M system to break down or to reduce the fault rate of described bank backstage O&M system, further comprises: each dimension of described final characteristic is marked to described final characteristic is divided into controlled data and uncontrollable data; The parameter of adjusting described controlled data is to avoid described bank background system to break down or to reduce the fault rate of described bank backstage O&M system.
The embodiment of second aspect present invention also provides the failure prediction system of a kind of bank backstage O&M system, comprising: data preprocessing module, for obtaining bank data sample, and extract initial characteristics data from described bank data sample; Feature Dimension Reduction module, for carrying out dimensionality reduction to obtain final characteristic to described initial characteristics data; Failure prediction module, for by random forest method, described final characteristic being carried out to classification learning, to described bank backstage O&M system is carried out to failure prediction according to classification results; Failure prevention module, for in the time that described failure prediction module predicts described bank backstage O&M system and will break down, the parameter of adjusting described bank backstage O&M system according to described final characteristic is to avoid described bank backstage O&M system to break down or to reduce the fault rate of described bank backstage O&M system.
According to the failure prediction system of bank's backstage O&M system of the embodiment of the present invention, raw data to bank server extracts, pre-service, Feature Dimension Reduction processing, use afterwards random forest method to carry out model training, for the predicted data newly arriving, the model that use trains is tested, and provide predicting the outcome of fault rate, finally the higher sample of fault rate is provided to parameter adjustment scheme, reduce thereby control failure rate.Therefore, this system can be predicted the fault of bank's backstage O&M system effectively, and can avoid or reduce by effective strick precaution the probability that fault occurs.
In addition, the failure prediction system of bank according to the above embodiment of the present invention backstage O&M system can also have following additional technical characterictic:
In some instances, described initial characteristics data comprise: time out fault data, database service data, network operation data, hard disk service data and CPU service data.
In some instances, described data preprocessing module is for carrying out duplicate removal to described bank data sample, deleting superfluous, time discretization alignment and data mark, to extract initial characteristics data.
In some instances, described Feature Dimension Reduction module by progressively forward Method for Feature Selection described initial characteristics data are carried out to dimensionality reduction to obtain final characteristic.
In some instances, described failure prevention module is for marking that to each dimension of described final characteristic described final characteristic is divided into controlled data and uncontrollable data, and the parameter of adjusting described controlled data is to avoid described bank background system to break down or to reduce the fault rate of described bank backstage O&M system.
Additional aspect of the present invention and advantage in the following description part provide, and part will become obviously from the following description, or recognize by practice of the present invention.
Brief description of the drawings
Above-mentioned and/or additional aspect of the present invention and advantage accompanying drawing below combination is understood becoming the description of embodiment obviously and easily, wherein:
Fig. 1 is the process flow diagram of the failure prediction method of bank backstage O&M system according to an embodiment of the invention;
Fig. 2 is the schematic flow sheet of the extraction initial characteristics data of the failure prediction method of bank backstage O&M system according to an embodiment of the invention;
Fig. 3 is the failure prediction schematic flow sheet of the failure prediction method of bank backstage O&M system according to an embodiment of the invention;
Fig. 4 is the principle Organization Chart of the failure prediction method of bank backstage O&M system in accordance with another embodiment of the present invention;
Fig. 5 is the structured flowchart of the failure prediction system of bank backstage O&M system in accordance with another embodiment of the present invention.
Embodiment
Describe embodiments of the invention below in detail, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has the element of identical or similar functions from start to finish.Be exemplary below by the embodiment being described with reference to the drawings, only for explaining the present invention, and can not be interpreted as limitation of the present invention.
Describe according to failure prediction method and the system of bank's backstage O&M system of the embodiment of the present invention below in conjunction with accompanying drawing.
Fig. 1 is the process flow diagram of the failure prediction method of bank backstage O&M system according to an embodiment of the invention.As shown in Figure 1, the failure prediction method of bank backstage O&M system according to an embodiment of the invention, comprises the following steps:
Step S101, obtains bank data sample, and extracts initial characteristics data from bank data sample.Particularly, in one embodiment of the invention, by bank data sample is carried out to duplicate removal, delete superfluous, time discretization alignment and data mark, to extract initial characteristics data.Wherein, in some instances, bank data sample comprises: time out fault data, database service data, network operation data, hard disk service data and CPU service data.
As a concrete example, because bank data is present in background system with the form of database, bank data sample is not directly added the direct derivation of screening by the database of bank's original server, and therefore every record is very detailed.This will bring the impact of two aspects: first, can obtain sufficient server state information, understand the performance index of all parts of each moment server.But, in raw readings, also exist on the other hand many mutual repetitions information and and the irrelevant garbage of target of prediction, these information can cause last model parameter to increase sharply, and make training become more difficult.
Therefore, before for model training, want first to carry out data pre-service according to data characteristics, comprising data duplicate removal, deletion irrelevant contents (deleting superfluous), time discretization alignment and four steps of data mark, by preliminary processing, just noise can be obtained less, the data set after treatment (initial characteristics data) that the degree of correlation is higher, specifically as shown in Figure 2.
On the other hand, banking system is a comprehensive complicated system, and in this example, the data that collect comprise time out fault data, database service data, network operation data, hard disk service data and the several parts of CPU service data.But the various piece of system is all independent collection separately, between different piece every writing time all incomplete same, from 1 second to 15 minutes not etc., therefore this just requires data to reorganize link at interval.
After obtaining initial characteristics data, need to there is trouble-free mark to data.For the huge like this system of bank, once accidental fault can't cause very large impact, and allowing to prediction does not have very large meaning yet.Therefore, in an embodiment of the present invention, think within 15 minutes, the number of times of fault is greater than the above comparatively dense fault that just can be referred to as 3 times, and intensive fault is only the target of prediction.
Step S102, carries out dimensionality reduction to obtain final characteristic to initial characteristics data.In one embodiment of the invention, by progressively forward Method for Feature Selection initial characteristics data are carried out to dimensionality reduction to obtain final characteristic.Particularly, the Main Function of dimension-reduction treatment is further to remove information useless and that correlativity is little, the data dimension impact bringing of expanding while alleviating pre-service.
In one embodiment of the invention, adopt progressively Method for Feature Selection forward to carry out dimensionality reduction to obtain final characteristic to initial characteristics data.As a concrete example, Method for Feature Selection (Features Selection), it is a class dimension reduction method, they are different from current this class mapping method of principal component analysis (PCA), Method for Feature Selection does not change raw data set substantially, just therefrom extracts useful dimension subspace and completes dimensionality reduction.
Progressively Method for Feature Selection is the simplest efficient a kind of feature selection approach forward, and its main flow process can illustrate by following step:
1) initial characteristics space is empty.
2) select a feature, the sorter of training after making to add feature space under present case can obtain the highest accuracy rate, and this feature is added to feature space at every turn.
3) repeat the 2nd) step, until selected the feature of enough dimensions.
To sum up, progressively Method for Feature Selection is a kind of greedy algorithm forward, has therefrom also extended many other feature selection approachs, repeats no more herein.
Step S103, utilizes random forest method to carry out classification learning to final characteristic, to bank's backstage O&M system is carried out to failure prediction according to classification results.
As a concrete example, classification problem is a large key areas of studying in machine learning always, provide positive and negative example prediction according to the data characteristics providing, corresponding to being exactly according to the running status before system in the embodiment of the present invention, show whether system can break down at short notice.And random forest is a kind of outstanding disaggregated model, because its robustness is high, good classification effect thereby used widely.
Random forest (Random Forests) is the comprehensive model of one that a kind of Bootstrap of combining method of sampling, feature selecting, Bagging training method, decision-tree model form.Forest is in fact the combination of many decision trees immediately.
Decision tree is that one forms disaggregated model by root step by step to leaf, each node selects the feature of a best dimension of current division degree to classify to sample, concrete system of selection is different and different according to the version of decision tree, what in embodiments of the invention, adopt is the splitting method of C4.5 decision tree, specific as follows:
Entropy ( S ) = Σ i = 1 c - p i log 2 p i ,
Gain ( S , A ) = Entropy ( S ) - Σ v ∈ Values ( A ) | S v | | S | Entropy ( S ) ,
SpliInformation ( S , A ) = - Σ i = 1 c | S i | | S | log 2 | S i | | S | ,
GainRatio ( S , A ) = Gain ( S , A ) SplitInformation ( S , A ) ,
Wherein, pi is the ratio that belongs to i class in S, and A is the attribute of sample, and Values (A) is the codomain of attribute A, and Sv is the number of samples that in S, A attribute equals v.C4.5 algorithm, uses the computing formula of information gain rate GainRatio (S, A) to choose division attribute, has improved the accuracy of decision tree.
Further, the training of Random Forest model can be summarized as following step:
1. for N original sample of input, adopt the mode of randomly drawing and putting back to sample, obtain a new N sample.
2. use the N arriving a sample training decision tree of sampling, suppose that sample has the attribute of M dimension, when node need to divide, therefrom randomly draw out the attribute of M dimension so, divide according to the rule of C4.5.
3. in the process of structure decision tree, each node need to divide according to the rule of the 2nd step, finally forms a decision tree.
4. constantly repeat 1~3 step, until obtained needing the decision tree of number, just formed random forest.
To sum up, can find out that random forest is the aggregate of decision tree, in testing, as long as every decision tree is tested respectively, finally result be voted.
And in this example of the present invention, adopt random forest method to predict transaction time out fault, but due in real sample, the ratio of negative number of cases certificate is much larger than the ratio of positive example data, for unbalanced like this data set, need to make corresponding adjustment to training method.
First a kind of simple method is exactly directly to copy positive example, make the quantity of positive example and bear routine quantity as many, but operation can make amount of training data greatly increase like this, thereby cause model more consuming time.A more feasible method is exactly with different weights to all samples, only need to divide the information gain formula using to decision tree and be weighted amendment, concrete weights are adjusted according to the ratio data of positive and negative example and to the demand of recall rate and accuracy rate.
Step S104, in the time predicting bank's backstage O&M system and will break down, the parameter of adjusting bank's backstage O&M system according to final characteristic is to avoid bank's backstage O&M system to break down or to reduce the fault rate of bank's backstage O&M system.Particularly, in some instances, first each dimension of final characteristic is marked to final characteristic is divided into controlled data and uncontrollable data, the parameter of then adjusting controlled data is to avoid bank's backstage O&M system to break down or to reduce the fault rate of bank's backstage O&M system.
In some instances, for general failure prediction, be all often to provide the time that prediction fault occurs, the measure of generation how to avoid fault but can not be provided, what therefore user can do only has Backup Data, and then the passive generation that waits pending fault, then attempts recovery system.And method of the present invention can be avoided by parameter adjustment the generation of fault.
First user need to mark each dimension of final characteristic, final characteristic is divided into controlled data or uncontrollable data.Because some variable user can regulate and control, for example CPU number, maximum number of connections etc. when server, these all belong to controlled data.And some variable to be user be difficult to changes, such as the number of transaction, disk read-write speed etc., these all belong to uncontrollable data.And the method is exactly will be by adjusting these variablees that can control to reach a lower failure rate.
Similar Bayesian model herein, can do between a feature and independently suppose:
P ( y = 0 | x 1 , x 2 . . . x m - 1 , x m ) = Π i = 1 m P ( y = 0 | x i ) ,
Wherein, y presentation class result, xi represents the feature of i the dimension that records x.
It should be noted that, although this hypothesis is not strict correct, in many cases, this hypothesis can well be approached reality, and independent hypothesis can simplify problem, thereby is easy to realization.
Further, observe above formula, can adjust the dimension of all permission control, make each maximization on above formula the right:
P(y=0|x i),
Can adjust xi by the method for enumerating herein, and required probability P (y=0|x i) can be by adding up and obtain in training set.
As a concrete example, shown in Fig. 2 and Fig. 3, the main flow process of the method for the above embodiment of the present invention can be summarized as: being first the processing of reading in of initial characteristic data, then carrying out Feature Dimension Reduction processing, is the training stage of model and the operational phase of system afterwards.Wherein, in the model training stage, mainly need to complete the database raw data analysis of backstage O&M system.First, raw data is carried out to duplicate removal, deleted superfluous, dimension-reduction treatment, and discretize has merged data pretreatment operation on time shaft, then characteristic has been carried out the training of Random Forest model and strick precaution model, completed prediction and take precautions against model.In the operational phase of system, user can submit the system information that needs prediction to, the test result of system meeting auto-returned Random Forest model, the failure rate of report current data, and for the higher data of failure rate, by calling strick precaution model, controllable parameter is adjusted, finally return to regulate and control method, thereby avoid fault to occur or reduce fault rate.
In another one example, shown in Fig. 4, the process structure of the method can be summarized as following several part: data preprocessing module is mainly used in reading raw data and configuration file, in configuration file, comprise the time that data start and finish, the attribute (continuous variable, discrete variable, useless variable) of all dimensions, and can control.In other words, this part has completed initial duplicate removal and has deleted superfluous work.
Random forest module is a Random Forest model training test module of supporting classification and return, has wherein also comprised about the training test module of taking precautions against model.In other words, this part is the core of the method, has completed the training and testing function of model.
Display module is mainly used in showing last predicting the outcome, and its function is by given data at two-dimensional coordinate plot on X axis out, so that user can see the result of failure prediction intuitively.
It should be noted that, the method for the embodiment of the present invention relates generally to the core technologies such as server feature extraction, preconditioning technique, Feature Dimension Reduction technology, random forest sorting technique, Techniques Against Fault About in implementation process.And these algorithms and graphic user interface, data are read in the functional modules such as module and all can be realized with language developments such as java, C++ under Windows.
Based on above-mentioned development platform, the method needs the support of following several level running environment in specific implementation process.First at operating system layer, need on Windows XP or its compatible operating system platform, move; Also need program run time infrastructure, namely java and C++ run time infrastructure simultaneously.When having possessed above-mentioned back-up environment, the method just can be implemented normally.
According to the failure prediction method of bank's backstage O&M system of the embodiment of the present invention, raw data to bank server extracts, pre-service, Feature Dimension Reduction processing, use afterwards random forest method to carry out model training, for the predicted data newly arriving, the model that use trains is tested, and provide predicting the outcome of fault rate, finally the higher sample of fault rate is provided to parameter adjustment scheme, reduce thereby control failure rate.Therefore, the method can be predicted the fault of bank's backstage O&M system effectively, and can avoid or reduce by effective strick precaution the probability that fault occurs.
Further embodiment of the present invention also provides the failure prediction system of a kind of bank backstage O&M system.
Fig. 5 is the structured flowchart of the failure prediction system of bank backstage O&M system according to an embodiment of the invention.As shown in Figure 5, the failure prediction system 500 of bank backstage O&M system according to an embodiment of the invention, comprising: data preprocessing module 510, Feature Dimension Reduction module 520, failure prediction module 530 and failure prevention module 540.
Particularly, data processing module 510 is for obtaining bank data sample, and from bank data sample, extracts initial characteristics data.Particularly, in one embodiment of the invention, data preprocessing module 510 is carried out duplicate removal, is deleted superfluous, time discretization alignment and data mark, to extract initial characteristics data bank data sample.In some instances, bank data sample comprises: time out fault data, database service data, network operation data, hard disk service data and CPU service data.
As a concrete example, because bank data is present in background system with the form of database, bank data sample is not directly added the direct derivation of screening by the database of bank's original server, and therefore every record is very detailed.This will bring the impact of two aspects: first, can obtain sufficient server state information, understand the performance index of all parts of each moment server.But, in raw readings, also exist on the other hand many mutual repetitions information and and the irrelevant garbage of target of prediction, these information can cause last model parameter to increase sharply, and make training become more difficult.
Therefore, before for model training, want first to carry out data pre-service according to data characteristics, comprising data duplicate removal, deletion irrelevant contents (deleting superfluous), time discretization alignment and four steps of data mark, by preliminary processing, just noise can be obtained less, the data set after treatment (initial characteristics data) that the degree of correlation is higher, specifically as shown in Figure 2.
On the other hand, banking system is a comprehensive complicated system, and in this example, the data that collect comprise time out fault data, database service data, network operation data, hard disk service data and the several parts of CPU service data.But the various piece of system is all independent collection separately, between different piece every writing time all incomplete same, from 1 second to 15 minutes not etc., therefore this just requires data to reorganize link at interval.
After obtaining initial characteristics data, need to there is trouble-free mark to data.For the huge like this system of bank, once accidental fault can't cause very large impact, and allowing to prediction does not have very large meaning yet.Therefore, in an embodiment of the present invention, think within 15 minutes, the number of times of fault is greater than the above comparatively dense fault that just can be referred to as 3 times,
And intensive fault is only the target of prediction.
Feature Dimension Reduction module 520 is for carrying out dimensionality reduction to obtain final characteristic to initial characteristics data.In one embodiment of the invention, Feature Dimension Reduction module 520 by progressively forward Method for Feature Selection initial characteristics data are carried out to dimensionality reduction to obtain final characteristic.Particularly, the Main Function of dimension-reduction treatment is further to remove information useless and that correlativity is little, the data dimension impact bringing of expanding while alleviating pre-service.
Description part to the description of Method for Feature Selection progressively referring to the Forecasting Methodology to the above embodiment of the present invention.
Failure prediction module 530 is for by random forest method, final characteristic being carried out to classification learning, to bank's backstage O&M system is carried out to failure prediction according to classification results.
Description part to the description of random forest method referring to the Forecasting Methodology to the above embodiment of the present invention.
When failure prevention module 540 will break down for predict bank's backstage O&M system in failure prediction module 530, the parameter of adjusting bank's backstage O&M system according to final characteristic is to avoid bank's backstage O&M system to break down or to reduce the fault rate of bank's backstage O&M system.Particularly, in one embodiment of the invention, first failure prevention module 540 marks final characteristic to be divided into controlled data and uncontrollable data to each dimension of final characteristic, and the parameter of then adjusting controlled data is to avoid bank's background system to break down or to reduce the fault rate of bank's backstage O&M system.
In some instances, for general failure prediction, be all often to provide the time that prediction fault occurs, the measure of generation how to avoid fault but can not be provided, what therefore user can do only has Backup Data, and then the passive generation that waits pending fault, then attempts recovery system.And prognoses system 500 of the present invention can be avoided by parameter adjustment the generation of fault.
First failure prevention module 540 need to mark each dimension of final characteristic, final characteristic is divided into controlled data or uncontrollable data.Because some variable user can regulate and control, for example CPU number, maximum number of connections etc. when server, these all belong to controlled data.And some variable to be user be difficult to changes, such as the number of transaction, disk read-write speed etc., these all belong to uncontrollable data.And failure prevention module 540 can be by adjusting these variablees that can control to reach a lower failure rate.
Similar Bayesian model herein, can do between a feature and independently suppose:
P ( y = 0 | x 1 , x 2 . . . x m - 1 , x m ) = Π i = 1 m P ( y = 0 | x i ) ,
Wherein, y presentation class result, xi represents the feature of i the dimension that records x.
It should be noted that, although this hypothesis is not strict correct, in many cases, this hypothesis can well be approached reality, and independent hypothesis can simplify problem, thereby is easy to realization.
Further, observe above formula, can adjust the dimension of all permission control, make each maximization on above formula the right:
P(y=0|x i),
Can adjust xi by the method for enumerating herein, and required probability P (y=0|x i) can be by adding up and obtain in training set.
As a concrete example, shown in Fig. 2 and Fig. 3, the main implementing procedure of the prognoses system 500 of the above embodiment of the present invention is summarized as: being first the processing of reading in of initial characteristic data, then carrying out Feature Dimension Reduction processing, is the training stage of model and the operational phase of system afterwards.Wherein, in the model training stage, mainly need to complete the database raw data analysis of backstage O&M system.First, raw data is carried out to duplicate removal, deleted superfluous, dimension-reduction treatment, and discretize has merged data pretreatment operation on time shaft, then characteristic has been carried out the training of Random Forest model and strick precaution model, completed prediction and take precautions against model.In the operational phase of system, user can submit the system information that needs prediction to, the test result of system meeting auto-returned Random Forest model, the failure rate of report current data, and for the higher data of failure rate, system, by calling strick precaution model, is adjusted controllable parameter, finally return to regulate and control method, thereby avoid fault to occur or reduce fault rate.
In another one example, shown in Fig. 4, the general frame of this prognoses system 500 can be summarized as following several part: data preprocessing module is mainly used in reading raw data and configuration file, in configuration file, comprise the time that data start and finish, the attribute (continuous variable, discrete variable, useless variable) of all dimensions, and can control.In other words, this part has completed initial duplicate removal and has deleted superfluous work.
Random forest module (failure prediction module) is a Random Forest model training test module of supporting classification and return, has wherein also comprised about the training test module of taking precautions against model.In other words, this part is the core of the method, has completed the training and testing function of model.
Display module is mainly used in showing last predicting the outcome, and its function is by given data at two-dimensional coordinate plot on X axis out, so that user can see the result of failure prediction intuitively.
It should be noted that, the prognoses system 500 of the embodiment of the present invention relates generally to the core technologies such as server feature extraction, preconditioning technique, Feature Dimension Reduction technology, random forest sorting technique, Techniques Against Fault About in implementation process.And these algorithms and graphic user interface, data are read in the functional modules such as module and all can be realized with language developments such as java, C++ under Windows.
Based on above-mentioned development platform, the deployment operation of this prognoses system 500 in implementation process needs the support of following several level running environment.First be operating system layer, need on Windows XP or its compatible operating system platform, move; Also need program run time infrastructure, namely java and C++ run time infrastructure simultaneously.When having possessed above-mentioned back-up environment, this prognoses system 500 just can normally be moved.
According to the failure prediction system of bank's backstage O&M system of the embodiment of the present invention, raw data to bank server extracts, pre-service, Feature Dimension Reduction processing, use afterwards random forest method to carry out model training, for the predicted data newly arriving, the model that use trains is tested, and provide predicting the outcome of fault rate, finally the higher sample of fault rate is provided to parameter adjustment scheme, reduce thereby control failure rate.Therefore, this system can be predicted the fault of bank's backstage O&M system effectively, and can avoid or reduce by effective strick precaution the probability that fault occurs.
In description of the invention, it will be appreciated that, term " " center ", " longitudinally ", " laterally ", " length ", " width ", " thickness ", " on ", D score, " front ", " afterwards ", " left side ", " right side ", " vertically ", " level ", " top ", " end " " interior ", " outward ", " clockwise ", " counterclockwise ", " axially ", " radially ", orientation or the position relationship of instructions such as " circumferentially " are based on orientation shown in the drawings or position relationship, only the present invention for convenience of description and simplified characterization, instead of device or the element of instruction or hint indication must have specific orientation, with specific orientation structure and operation, therefore can not be interpreted as limitation of the present invention.
In addition, term " first ", " second " be only for describing object, and can not be interpreted as instruction or hint relative importance or the implicit quantity that indicates indicated technical characterictic.Thus, at least one this feature can be expressed or impliedly be comprised to the feature that is limited with " first ", " second ".In description of the invention, the implication of " multiple " is at least two, for example two, and three etc., unless otherwise expressly limited specifically.
In the present invention, unless otherwise clearly defined and limited, the terms such as term " installation ", " being connected ", " connection ", " fixing " should be interpreted broadly, and for example, can be to be fixedly connected with, and can be also to removably connect, or integral; Can be mechanical connection, can be also electrical connection; Can be to be directly connected, also can indirectly be connected by intermediary, can be the connection of two element internals or the interaction relationship of two elements, unless separately there is clear and definite restriction.For the ordinary skill in the art, can understand as the case may be above-mentioned term concrete meaning in the present invention.
In the present invention, unless otherwise clearly defined and limited, First Characteristic Second Characteristic " on " or D score can be that the first and second features directly contact, or the first and second features are by intermediary indirect contact.And, First Characteristic Second Characteristic " on ", " top " and " above " but First Characteristic directly over Second Characteristic or oblique upper, or only represent that First Characteristic level height is higher than Second Characteristic.First Characteristic Second Characteristic " under ", " below " and " below " can be First Characteristic under Second Characteristic or tiltedly, or only represent that First Characteristic level height is less than Second Characteristic.
In the description of this instructions, the description of reference term " embodiment ", " some embodiment ", " example ", " concrete example " or " some examples " etc. means to be contained at least one embodiment of the present invention or example in conjunction with specific features, structure, material or the feature of this embodiment or example description.In this manual, to the schematic statement of above-mentioned term not must for be identical embodiment or example.And, specific features, structure, material or the feature of description can one or more embodiment in office or example in suitable mode combination.In addition,, not conflicting in the situation that, those skilled in the art can carry out combination and combination by the feature of the different embodiment that describe in this instructions or example and different embodiment or example.
Although illustrated and described embodiments of the invention above, be understandable that, above-described embodiment is exemplary, can not be interpreted as limitation of the present invention, and those of ordinary skill in the art can change above-described embodiment within the scope of the invention, amendment, replacement and modification.

Claims (10)

1. a failure prediction method for bank backstage O&M system, is characterized in that, comprises the following steps:
Obtain bank data sample, and extract initial characteristics data from described bank data sample;
Described initial characteristics data are carried out to dimensionality reduction to obtain final characteristic;
Utilize random forest method to carry out classification learning to described final characteristic, to described bank backstage O&M system is carried out to failure prediction according to classification results;
In the time predicting described bank backstage O&M system and will break down, the parameter of adjusting described bank backstage O&M system according to described final characteristic is to avoid described bank backstage O&M system to break down or to reduce the fault rate of described bank backstage O&M system.
2. the failure prediction method of bank according to claim 1 backstage O&M system, is characterized in that, described bank data sample comprises: time out fault data, database service data, network operation data, hard disk service data and CPU service data.
3. the failure prediction method of bank according to claim 2 backstage O&M system, is characterized in that, described in obtain bank data sample, and extract initial characteristics data from described bank data sample, specifically comprise:
Described bank data sample is carried out duplicate removal, deletes superfluous, time discretization alignment and data mark, to extract initial characteristics data.
4. the failure prediction method of bank according to claim 1 backstage O&M system, is characterized in that, by progressively forward Method for Feature Selection described initial characteristics data are carried out to dimensionality reduction to obtain final characteristic.
5. the failure prediction method of bank according to claim 1 backstage O&M system, it is characterized in that, the parameter of adjusting described bank backstage O&M system according to described final characteristic, to avoid described bank backstage O&M system to break down or to reduce the fault rate of described bank backstage O&M system, further comprises:
Each dimension of described final characteristic is marked to described final characteristic is divided into controlled data and uncontrollable data;
The parameter of adjusting described controlled data is to avoid described bank backstage O&M system to break down or to reduce the fault rate of described bank backstage O&M system.
6. a failure prediction system for bank backstage O&M system, is characterized in that, comprising:
Data preprocessing module for obtaining bank data sample, and is extracted initial characteristics data from described bank data sample;
Feature Dimension Reduction module, for carrying out dimensionality reduction to obtain final characteristic to described initial characteristics data;
Failure prediction module, for by random forest method, described final characteristic being carried out to classification learning, to described bank backstage O&M system is carried out to failure prediction according to classification results;
Failure prevention module, for in the time that described failure prediction module predicts described bank backstage O&M system and will break down, the parameter of adjusting described bank backstage O&M system according to described final characteristic is to avoid described bank backstage O&M system to break down or to reduce the fault rate of described bank backstage O&M system.
7. the failure prediction system of bank according to claim 6 backstage O&M system, is characterized in that, described bank data sample comprises: time out fault data, database service data, network operation data, hard disk service data and CPU service data.
8. the failure prediction system of bank according to claim 7 backstage O&M system, it is characterized in that, described data preprocessing module is for carrying out duplicate removal to described bank data sample, deleting superfluous, time discretization alignment and data mark, to extract initial characteristics data.
9. bank according to claim 6 backstage O&M system failure prognoses system, is characterized in that, described Feature Dimension Reduction module by progressively forward Method for Feature Selection described initial characteristics data are carried out to dimensionality reduction to obtain final characteristic.
10. the failure prediction system of bank according to claim 6 backstage O&M system, it is characterized in that, described failure prevention module is for marking that to each dimension of described final characteristic described final characteristic is divided into controlled data and uncontrollable data, and the parameter of adjusting described controlled data is to avoid described bank background system to break down or to reduce the fault rate of described bank backstage O&M system.
CN201410337349.4A 2014-07-15 2014-07-15 Failure predication system and failure predication method for background operation and maintenance system of bank Pending CN104156562A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410337349.4A CN104156562A (en) 2014-07-15 2014-07-15 Failure predication system and failure predication method for background operation and maintenance system of bank

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410337349.4A CN104156562A (en) 2014-07-15 2014-07-15 Failure predication system and failure predication method for background operation and maintenance system of bank

Publications (1)

Publication Number Publication Date
CN104156562A true CN104156562A (en) 2014-11-19

Family

ID=51882060

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410337349.4A Pending CN104156562A (en) 2014-07-15 2014-07-15 Failure predication system and failure predication method for background operation and maintenance system of bank

Country Status (1)

Country Link
CN (1) CN104156562A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930473A (en) * 2016-04-25 2016-09-07 安徽富驰信息技术有限公司 Random forest technology-based similar file retrieval method
CN106844152A (en) * 2017-01-17 2017-06-13 清华大学 Bank's background task runs the correlation analysis and device of batch time
CN107025547A (en) * 2016-09-19 2017-08-08 阿里巴巴集团控股有限公司 Payment channel detection method, device and terminal
CN107392320A (en) * 2017-07-28 2017-11-24 郑州云海信息技术有限公司 A kind of method that hard disk failure is predicted using machine learning
CN107819601A (en) * 2016-09-14 2018-03-20 南京联成科技发展股份有限公司 A kind of safe O&M service architecture quickly and efficiently based on Spark
CN108737193A (en) * 2018-06-05 2018-11-02 亚信科技(中国)有限公司 A kind of failure prediction method and device
CN109579220A (en) * 2018-10-15 2019-04-05 平安科技(深圳)有限公司 Air-conditioning system fault detection method, device, electronic equipment
CN109634828A (en) * 2018-12-17 2019-04-16 浪潮电子信息产业股份有限公司 Failure prediction method, device, equipment and storage medium
CN110135633A (en) * 2019-04-29 2019-08-16 北京六捷科技有限公司 A kind of railway service Call failure prediction technique and device
CN110163261A (en) * 2019-04-28 2019-08-23 平安科技(深圳)有限公司 Unbalanced data disaggregated model training method, device, equipment and storage medium
CN110334720A (en) * 2018-03-30 2019-10-15 百度在线网络技术(北京)有限公司 Feature extracting method, device, server and the storage medium of business datum
CN110334732A (en) * 2019-05-20 2019-10-15 北京思路创新科技有限公司 A kind of Urban Air Pollution Methods and device based on machine learning
CN112613584A (en) * 2021-01-07 2021-04-06 国网上海市电力公司 Fault diagnosis method, device, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102221655A (en) * 2011-06-16 2011-10-19 河南省电力公司济源供电公司 Random-forest-model-based power transformer fault diagnosis method
CN103257921A (en) * 2013-04-16 2013-08-21 西安电子科技大学 Improved random forest algorithm based system and method for software fault prediction

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102221655A (en) * 2011-06-16 2011-10-19 河南省电力公司济源供电公司 Random-forest-model-based power transformer fault diagnosis method
CN103257921A (en) * 2013-04-16 2013-08-21 西安电子科技大学 Improved random forest algorithm based system and method for software fault prediction

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
叶圣永等: "基于随机森林算法的电力***暂态稳定性评估", 《西南交通大学学报》 *
李伟贺等: "采用核主元成分分析和随机森林的电梯故障诊断", 《化工自动化及仪表》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930473B (en) * 2016-04-25 2019-04-05 安徽富驰信息技术有限公司 A kind of similar documents search method based on random forest technology
CN105930473A (en) * 2016-04-25 2016-09-07 安徽富驰信息技术有限公司 Random forest technology-based similar file retrieval method
CN107819601A (en) * 2016-09-14 2018-03-20 南京联成科技发展股份有限公司 A kind of safe O&M service architecture quickly and efficiently based on Spark
CN107025547A (en) * 2016-09-19 2017-08-08 阿里巴巴集团控股有限公司 Payment channel detection method, device and terminal
CN107025547B (en) * 2016-09-19 2020-10-16 创新先进技术有限公司 Payment channel detection method and device and terminal
CN106844152A (en) * 2017-01-17 2017-06-13 清华大学 Bank's background task runs the correlation analysis and device of batch time
CN107392320A (en) * 2017-07-28 2017-11-24 郑州云海信息技术有限公司 A kind of method that hard disk failure is predicted using machine learning
CN110334720A (en) * 2018-03-30 2019-10-15 百度在线网络技术(北京)有限公司 Feature extracting method, device, server and the storage medium of business datum
CN108737193A (en) * 2018-06-05 2018-11-02 亚信科技(中国)有限公司 A kind of failure prediction method and device
CN109579220A (en) * 2018-10-15 2019-04-05 平安科技(深圳)有限公司 Air-conditioning system fault detection method, device, electronic equipment
CN109579220B (en) * 2018-10-15 2022-04-12 平安科技(深圳)有限公司 Air conditioning system fault detection method and device and electronic equipment
CN109634828A (en) * 2018-12-17 2019-04-16 浪潮电子信息产业股份有限公司 Failure prediction method, device, equipment and storage medium
CN110163261A (en) * 2019-04-28 2019-08-23 平安科技(深圳)有限公司 Unbalanced data disaggregated model training method, device, equipment and storage medium
WO2020220544A1 (en) * 2019-04-28 2020-11-05 平安科技(深圳)有限公司 Unbalanced data classification model training method and apparatus, and device and storage medium
CN110163261B (en) * 2019-04-28 2024-06-21 平安科技(深圳)有限公司 Unbalanced data classification model training method, device, equipment and storage medium
CN110135633A (en) * 2019-04-29 2019-08-16 北京六捷科技有限公司 A kind of railway service Call failure prediction technique and device
CN110334732A (en) * 2019-05-20 2019-10-15 北京思路创新科技有限公司 A kind of Urban Air Pollution Methods and device based on machine learning
CN112613584A (en) * 2021-01-07 2021-04-06 国网上海市电力公司 Fault diagnosis method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN104156562A (en) Failure predication system and failure predication method for background operation and maintenance system of bank
Moni et al. Life cycle assessment of emerging technologies: A review
US10423403B1 (en) Utilizing a machine learning model to predict metrics for an application development process
EP3989131A1 (en) Method and system for realizing machine learning modeling process
KR101864286B1 (en) Method and apparatus for using machine learning algorithm
Kamei et al. Studying just-in-time defect prediction using cross-project models
CN101556553B (en) Defect prediction method and system based on requirement change
CN100412871C (en) System and method to generate domain knowledge for automated system management
CN104503874A (en) Hard disk failure prediction method for cloud computing platform
CN104123592A (en) Method and system for predicting transaction per second (TPS) transaction events of bank background
CN104778622A (en) Method and system for predicting TPS transaction event threshold value
CN104636401B (en) Method and device for data rollback of SCADA (supervisory control and data acquisition) system
WEI et al. Software defect prediction via deep belief network
CN104021180A (en) Combined software defect report classification method
Rasiman et al. How effective is automated trace link recovery in model-driven development?
CN111210332A (en) Method and device for generating post-loan management strategy and electronic equipment
Sriliasta et al. Overview of life cycle assessment of current emerging technologies
EP4138004A1 (en) Method and apparatus for assisting machine learning model to go online
CN111930944B (en) File label classification method and device
KR102576320B1 (en) Apparatus for amplifying training dataset for deep learning based generative ai system and method thereof
CN109254827A (en) A kind of secure virtual machine means of defence and system based on big data and machine learning
CN103853701A (en) Neural-network-based self-learning semantic detection method and system
CN116522912A (en) Training method, device, medium and equipment for package design language model
CN116302984A (en) Root cause analysis method and device for test task and related equipment
KR102448114B1 (en) Analysing apparatus for trend of technology, and control method thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20141119