CN108280289A - Bump danger classes prediction technique based on local weighted C4.5 algorithms - Google Patents

Bump danger classes prediction technique based on local weighted C4.5 algorithms Download PDF

Info

Publication number
CN108280289A
CN108280289A CN201810058598.8A CN201810058598A CN108280289A CN 108280289 A CN108280289 A CN 108280289A CN 201810058598 A CN201810058598 A CN 201810058598A CN 108280289 A CN108280289 A CN 108280289A
Authority
CN
China
Prior art keywords
sample
attribute
data
training set
separation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810058598.8A
Other languages
Chinese (zh)
Other versions
CN108280289B (en
Inventor
王彦彬
彭连会
何满辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Liaoning Technical University
Original Assignee
Liaoning Technical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Liaoning Technical University filed Critical Liaoning Technical University
Priority to CN201810058598.8A priority Critical patent/CN108280289B/en
Publication of CN108280289A publication Critical patent/CN108280289A/en
Application granted granted Critical
Publication of CN108280289B publication Critical patent/CN108280289B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention provides a kind of bump danger classes prediction technique based on local weighted C4.5 algorithms, is related to Prediction of Rock Burst technical field.This method uses MDLP methods to carry out discretization to the connection attribute data in sample data first, local weighted method choice training set is used again and calculates sample weights, the information gain-ratio of each attribute is calculated using sample weights, select sample attribute as the root node of C4.5 decision trees and the Split Attribute of other each branch nodes according to information gain-ratio, it finally uses sample weights that number of samples is replaced to carry out pessimistic beta pruning to the decision tree of foundation, realizes the prediction to estimation range bump danger classes.Bump danger classes prediction technique provided by the invention based on local weighted C4.5 algorithms, it is biased to the selection more multiattribute deficiency of value when overcoming in ID3 algorithms using information gain selection node split attribute, overfitting problem is avoided, the forecasting accuracy of model is higher.

Description

Bump danger classes prediction technique based on local weighted C4.5 algorithms
Technical field
The present invention relates to Prediction of Rock Burst technical field more particularly to a kind of impacts based on local weighted C4.5 algorithms Press danger classes prediction technique in ground.
Background technology
Bump be around mine working and stope coal and rock generated due to the release of deformation energy with unexpected, anxious The dynamic phenomenon that acute, fierce destruction is characterized is one of the disaster for influencing Safety of Coal Mine Production, nearly all in the world Country is all threatened by bump to some extent, and developed country was for Energy restructuring and security consideration land in recent years Continuous to close bump mine, China becomes the main injured country of bump and carries out the major country of Controlling of Coal Outburst.
Impact is pressed and predicted, evaluated to be pressed into impact on the basis of pressing genesis mechanism research to impact Row prevention a committed step, but due to the mechanism to bump do not recognize completely it is clear, especially to deep rush It presses the research of genesis mechanism to be still in the starting stage with hitting, increases the difficulty of Prediction of Rock Burst.Impact is pressed at present The method of row prediction mainly has rock mechanics method and geophysical method, wherein rock mechanics method to have drilling cuttings method, adopt and answer Power detection method etc., geophysical method have the methods of monitoring ground sound, micro seismic monitoring, electromagnetic radiation monitoring;In addition with artificial intelligence There are some methods for carrying out Prediction of Rock Burst using intelligent algorithm, such as in the development of energy:Neural network method, Bayes sentence Other analytic approach, support vector machines etc., the above method achieve lot of research in carrying out bump danger classes prediction, But there is also the sample size that some problems, such as neural network generally require is more, and it is used for the sample size of Prediction of Rock Burst Less, Bayes methods need have higher independence between data, and the bump sampled data in reality is difficult to meet solely Vertical property requirement, and the above method does not account for the overfitting problem etc. of model.
Invention content
In view of the drawbacks of the prior art, the present invention provides a kind of bump danger etc. based on local weighted C4.5 algorithms Grade prediction technique, realizes the prediction to the bump danger classes of coal and rock around mine working and stope.
Bump danger classes prediction technique based on local weighted C4.5 algorithms, includes the following steps:
Step 1, the bump data of acquisition known class are as sample data, if the sample data set of acquisition is T, sample This category set is C, and k ' is sample class sum, and the quantity of sample is N;
Step 2 describes criterion (MDLP, the to the connection attribute data in the sample data of known class using minimum Minimum Description Length Principle) discretization is carried out, specific method is:
Step 2.1:By wait for discretization one group of continuous property and its respective classes according to continuous property from small to large Sequence be ranked up;
Step 2.2:The difference of classification corresponding to the continuous property after sequence, select continuous property as point Boundary's point constitutes boundary point set;If different classes of corresponding attribute value is identical, the attribute corresponding to minimum classification is selected Value is used as separation;
Step 2.3:The information gain for calculating all separations in boundary point set, selects the boundary of information gain minimum Point, and judge whether the separation meets minimum description criterion and retain the separation if met;Otherwise, remove the boundary Point;
The calculation formula of the information gain of the separation is as follows:
Gain (a)=H (C)-H (C | a)
Wherein, a is the separation that separation is concentrated, and H (C) is classification information entropy, and H (C | a) is separation a by classification Set C is divided into the comentropy after two subsets;
If aminIt is the separation of information gain minimum, category set C is divided into two subset C1And C2, judge amin The calculation formula for whether meeting minimum description criterion is as follows:
Gain(amin) > log2(N-1)/N+log2(3k′- 2)-[k ' H (C)-k '1H(C1)-k′2H(C2)]
Wherein, k '1、k′2Respectively subset C1And C2In included categorical measure;
Step 2.4:Whether separation in judgment step 2.3 will also have in two sequence of intervals that original data set is divided Other separations, if so, then the separation in each sequence of intervals reformulates corresponding boundary point set and return to step 2.3, continue to judge whether each sequence of intervals retains corresponding boundary according to the quantity of sample in sequence of intervals and respective classes set Point, it is no to then follow the steps 2.5;
Step 2.5:According to the boundary point set of final choice, sequence of intervals division is carried out to connection attribute data, if Final no separation meets minimum description criterion, then all connection attribute data are divided into a sequence of intervals in the attribute, Otherwise connection attribute data are divided into different sequence of intervals by separation, obtain the discretization results of connection attribute data;
Step 2.6:Whether the connection attribute in judgement sample data set has carried out discretization, if it is, executing step Rapid 3, otherwise, step 2.1- steps 2.5 are repeated, all connection attributes of sample data set are subjected to discretization;
Step 3, acquisition region to be predicted bump attribute data, and by connection attribute data therein and step 2 Middle respective attributes data are compared, and connection attribute number in bump attribute data in region to be predicted is determined according to comparison result According to the sequence of intervals at place, thus by the connection attribute Data Discretization in bump attribute data in region to be predicted;
It is searched in step 4, the discretization d ataset generated from step 2 using k nearest neighbor algorithm adjacent with sample to be predicted K sample, the training set of C4.5 decision trees is made of k sample, and calculate the weight of sample in training set;
The weight of sample is calculated according to following formula in the training set:
Wherein, ωiFor the weight of i-th in the training set sample adjacent with sample to be predicted, i=1,2 ..., k, diTo wait for Forecast sample is to i-th of sample data xiDistance, which uses the attribute data of sample, and is counted according to range formula It calculates, dmaxFor the maximum value of the distance of all samples in sample to training set to be predicted;
Step 5:According to the information gain-ratio of all properties in the weight calculation training set of sample data in training set, in root In the generating process of node and other each branch nodes, select in each secondary iterative process the maximum attribute of information gain-ratio as The Split Attribute of root node and other each branch nodes in C4.5 decision trees;
The specific method of the information gain-ratio of attribute is in the calculating training set:
If V is an attribute in training set, vjFor j-th of attribute value in attribute V, j=1,2 ..., m, m is training The mutual misaligned attribute value number for concentrating the attribute V of sample data, the category set in training set corresponding to sample data For C '={ c1、c2、…、cn, wherein ci′For the i-th ' a classification, i '=1,2 ..., n, n is corresponding to sample data in training set The sum of classification, the specific method for calculating the information gain-ratio of attribute in training set are:
The classification information entropy for calculating sample data in training set, is shown below:
Wherein,It is c for sample class in training seti′Sample weight and, ωC′For all categories in training set The weight of sample and p (ci′) it is that classification is c in training seti′Sample weight andWith the weight of the sample of all categories and ωC′Ratio;
The class condition entropy for calculating sample data in training set, is shown below:
Wherein,It is v for attribute valuejSample weight and, ωVFor all samples in attribute V weight and,Table Show that attribute value is vjSample in belong to ci′The sum of sample weights of class, p (vj) it is that attribute value is v in training setjSample Weight and with the weight of all samples and ratio, p (ci′|vj) be attribute value it is vjSample in classification be ci′Sample Weight and with all properties value be vjSample weight sum ratio;
The information gain for calculating the attribute V of sample data in training set, is shown below:
I (C ', V)=I (C ')-I (C ' | V)
The comentropy for calculating the attribute V of sample data in training set, is shown below:
The information gain-ratio for calculating the attribute V of sample data in training set, is shown below:
Gain_radio (V)=I (C ', V)/I (V);
Step 6:Decision tree is established according to Split Attribute, beta pruning, beta pruning are then carried out to decision tree using pessimistic beta pruning method The error rate for using sample weights that number of samples is replaced to calculate branch node and corresponding leaf node in the process;Finally by generating Decision tree presses danger classes to be predicted with treating the potential impact of estimation range.
As shown from the above technical solution, the beneficial effects of the present invention are:It is provided by the invention to be based on local weighted C4.5 The minimum of the bump danger classes prediction technique of algorithm, use describes criterion MDLP methods to continuous in sample data The discretization that attribute data carries out can preferably handle connection attribute data, after local weighted method can be according to discretization The distance of sample to sample to be predicted select training set and assign different weights to the sample in training set, the C4.5 of use is calculated Method calculates information gain-ratio to select node split attribute using sample weights, overcomes in ID3 algorithms and is selected using information gain It is biased to selection value more multiattribute deficiency when selecting node split attribute, and replaces number of samples progress pessimistic using sample weights Cut operator avoids overfitting problem, improves the accuracy of prediction model.
Description of the drawings
Fig. 1 is the bump danger classes prediction technique provided in an embodiment of the present invention based on local weighted C4.5 algorithms Flow chart.
Specific implementation mode
With reference to the accompanying drawings and examples, the specific implementation mode of the present invention is described in further detail.Implement below Example is not limited to the scope of the present invention for illustrating the present invention.
The present embodiment uses the impact based on local weighted C4.5 algorithms of the present invention by taking the Yanshitai Colliery of somewhere as an example Danger classes prediction technique is pressed on ground, is predicted the bump danger classes of the Yanshitai Colliery.
Bump danger classes prediction technique based on local weighted C4.5 algorithms, as shown in Figure 1, including following step Suddenly:
Step 1, the bump data of acquisition known class are as sample data, if the sample data set of acquisition is T, sample This category set is C, and k ' is sample class sum, and the quantity of sample is N.
Since the factor for influencing bump is more, the present embodiment chooses coal thickness (V1), inclination angle (V2), buried depth (V3), construction Situation (V4), change of pitch angle (V5), coal thickness change (V6), gas density (V7), roof control (V8), release (V9), ring coal report (V10) 10 factors predict the bump danger classes of coal mine as the attribute of sample data.Wherein, situation is constructed (V4), change of pitch angle (V5), coal thickness change (V6), roof control (V8), release (V9), ring coal report (V10) it is state parameter, it assigns It is as shown in table 1 to be worth situation:
1 state parameter assignment of table
The danger classes of bump is divided into four classifications according to the intensity of bump, the classification 1 of respectively microshock, The classification 2 of weak impact, the classification 4 of the classification of medium impact 3 and thump.
The bump data as sample data of the present embodiment acquisition are as shown in table 2.
Bump data of the table 2 as sample data
Step 2 describes criterion (MDLP, the to the connection attribute data in the sample data of known class using minimum Minimum Description Length Principle) discretization is carried out, specific method is:
Step 2.1:By wait for discretization one group of continuous property and its respective classes according to continuous property from small to large Sequence be ranked up;
Step 2.2:The difference of classification corresponding to the continuous property after sequence, select continuous property as point Boundary's point constitutes boundary point set;If different classes of corresponding attribute value is identical, the attribute corresponding to minimum classification is selected Value is used as separation;
Step 2.3:The information gain for calculating all separations in boundary point set, selects the boundary of information gain minimum Point, and judge whether the separation meets minimum description criterion and retain the separation if met;Otherwise, remove the boundary Point;
The calculation formula of the information gain of the separation is as follows:
Gain (a)=H (C)-H (C | a)
Wherein, a is the separation that separation is concentrated, and H (C) is classification information entropy, and H (C | a) is separation a by classification Set C is divided into the comentropy after two subsets;
If aminIt is the separation of information gain minimum, category set C is divided into two subset C1And C2, judge amin The calculation formula for whether meeting minimum description criterion is as follows:
Gain(amin) > log2(N-1)/N+log2(3k′- 2)-[k ' H (C)-k '1H(C1)-k′2H(C2)]
Wherein, k '1、k′2Respectively subset C1And C2In included categorical measure;
Step 2.4:Whether separation in judgment step 2.3 will also have in two sequence of intervals that original data set is divided Other separations, if so, then the separation in each sequence of intervals reformulates corresponding boundary point set and return to step 2.3, continue to judge whether each sequence of intervals retains corresponding boundary according to the quantity of sample in sequence of intervals and respective classes set Point, it is no to then follow the steps 2.5;
Step 2.5:According to the boundary point set of final choice, sequence of intervals division is carried out to connection attribute data, if Final no separation meets minimum description criterion, then all connection attribute data are divided into a sequence of intervals in the attribute, Otherwise connection attribute data are divided into different sequence of intervals by separation, obtain the discretization results of connection attribute data;
Step 2.6:Whether the connection attribute in judgement sample data set has carried out discretization, if it is, executing step Rapid 3, otherwise, step 2.1- steps 2.5 are repeated, all connection attributes of sample data set are subjected to discretization.
In the present embodiment, connection attribute V1、V3And V7The information gain of separation concentrated of separation be unsatisfactory for minimum Criterion is described, according to MDLP discretization principles, corresponding connection attribute data discrete turns to same sequence of intervals, defeated in the present embodiment Go out is 1.Connection attribute V2Final separation be continuous property 45, therefore will be greater than being classified as one equal to 45 continuous property A sequence of intervals and output are 2, and the continuous property less than 45 is classified as a sequence of intervals and exports to be 1.In the present embodiment, warp It is as shown in table 3 to cross the sample data as training set after discretization.
Sample data after 3 discretization of table
Step 3, acquisition region to be predicted bump attribute data, and by connection attribute data therein and step 2 Middle respective attributes data are compared, and connection attribute number in bump attribute data in region to be predicted is determined according to comparison result According to the sequence of intervals at place, thus by the connection attribute Data Discretization in bump attribute data in region to be predicted.
In the present embodiment, in order to verify the validity of the method for the present invention, using the attribute data in table 4 as acquisition wait for it is pre- Region bump attribute data is surveyed, the categorical data in table 4 is used for being compared with prediction result, in 10 groups of data Connection attribute data obtained in 10 groups of data by being compared with respective attributes data in 25 groups of data in table 2 The discretization results of connection attribute data, as shown in table 5.
4 data to be predicted of table
Serial number V1/m V2/(°) V3/m V4 V5 V6 V7/(m3·min-1) V8 V9 V10 Classification
1 1.5 35 530 0 0 0 0.56 3 3 0 1
2 1.6 62 307 3 2 2 1 0 0 2 4
3 1.9 59 542 1 2 3 0.25 0 0 1 3
4 1.3 44 570 0 0 0 0.66 3 3 0 1
5 2.2 54 290 3 2 2 1 0 0 2 4
6 3 34 475 2 2 1 0.42 0 0 2 3
7 3.2 42 574 3 0 0 0.29 0 0 2 3
8 1.8 62 283 3 2 3 1 0 0 2 4
9 1.3 44 656 2 1 3 0.24 1 1 2 3
10 1.2 40 553 2 2 2 0.49 1 2 2 3
Data to be predicted after 5 discretization of table
It is searched in step 4, the discretization d ataset generated from step 2 using k nearest neighbor algorithm adjacent with sample to be predicted K sample, the training set of C4.5 decision trees is made of k sample, and calculate the weight of sample in training set;
The weight of sample is calculated according to following formula in training set:
Wherein, ωiFor the weight of i-th in the training set sample adjacent with sample to be predicted, i=1,2 ..., k, diTo wait for Forecast sample is to i-th of sample data xiDistance, which uses the attribute data of sample, and is counted according to range formula It calculates, dmaxFor the maximum value of the distance of all samples in sample to training set to be predicted.
Step 5:According to the information gain-ratio of all properties in the weight calculation training set of sample data in training set, in root In the generating process of node and other each branch nodes, select in each secondary iterative process the maximum attribute of information gain-ratio as The Split Attribute of root node and other each branch nodes in C4.5 decision trees;
The specific method of the information gain-ratio of attribute is in the calculating training set:
If V is an attribute in training set, vjFor j-th of attribute value in attribute V, j=1,2 ..., m, m is training The mutual misaligned attribute value number for concentrating the attribute V of sample data, the category set in training set corresponding to sample data For C '={ c1、c2、…、cn, wherein ci′For the i-th ' a classification, i '=1,2 ..., n, n is corresponding to sample data in training set The sum of classification, the specific method for calculating the information gain-ratio of attribute in training set are:
The classification information entropy for calculating sample data in training set, is shown below:
Wherein,It is c for sample class in training seti′Sample weight and, ωC′For all categories in training set The weight of sample and p (ci′) it is that classification is c in training seti′Sample weight andWith the weight of the sample of all categories and ωC′Ratio;
The class condition entropy for calculating sample data in training set, is shown below:
Wherein,It is v for attribute valuejSample weight and, ωVFor all samples in attribute V weight and,Table Show that attribute value is vjSample in belong to ci′The sum of sample weights of class, p (vj) it is that attribute value is v in training setjSample Weight and with the weight of all samples and ratio, p (ci′|vj) be attribute value it is vjSample in classification be ci′Sample Weight and with all properties value be vjSample weight sum ratio;
The information gain for calculating the attribute V of sample data in training set, is shown below:
I (C ', V)=I (C ')-I (C ' | V)
The comentropy for calculating the attribute V of sample data in training set, is shown below:
The information gain-ratio for calculating the attribute V of sample data in training set, is shown below:
Gain_radio (V)=I (C ', V)/I (V);
Step 6:Decision tree is established according to Split Attribute, beta pruning, beta pruning are then carried out to decision tree using pessimistic beta pruning method The error rate for using sample weights that number of samples is replaced to calculate branch node and corresponding leaf node in the process;Finally by generating Decision tree presses danger classes to be predicted with treating the potential impact of estimation range.
In the present embodiment, in order to verify the estimated performance for the decision-tree model established according to the sample data after discretization, The method of ten folding cross validations is used to test model first.Since sample data quantity is few in training set, intersection is tested The sample data selected in card in whole training sets is used as neighbouring sample, in addition significance during the beta pruning of C4.5 decision trees It is set as common 25%, the sample distance in weighting study is determined using Euclidean distance function, is trained using discretization Accuracy of the model through ten folding cross validation results that sample set is established is 88%, and uses the mould that initial data is established in table 2 The accuracy of type is 84%, shows that the sample data after discretization can establish better prediction model.
Using local weighted C4.5 algorithms to the bump danger classes in the area to be predicted in the table 4 after discretization into Row prediction.The present embodiment is also built using NaiveBayes methods, original C4.5 methods and random forest method according to data in table 2 It founds prediction model to predict the bump danger classes in table 4, such as table 6 of the comparison with the prediction result of the method for the present invention It is shown:
The prediction result of 6 bump danger classes of table compares
Algorithm Accuracy
NaiveBayes 70%
C4.5 decision trees 80%
Random forest 80%
The method of the present invention 100%
The bump danger classes in the method for the present invention energy Accurate Prediction area to be predicted, prediction result are excellent as seen from the table In NaiveBayes methods, the prediction result of original C4.5 methods and random forest method.
Finally it should be noted that:The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although Present invention has been described in detail with reference to the aforementioned embodiments, it will be understood by those of ordinary skill in the art that:It still may be used To modify to the technical solution recorded in previous embodiment, either which part or all technical features are equal It replaces;And these modifications or replacements, model defined by the claims in the present invention that it does not separate the essence of the corresponding technical solution It encloses.

Claims (4)

1. a kind of bump danger classes prediction technique based on local weighted C4.5 algorithms, it is characterised in that:Including following Step:
Step 1, the bump data of acquisition known class are as sample data, if the sample data set of acquisition is T, sample Category set is C, and k ' is sample class sum, and the quantity of sample is N;
Step 2 describes criterion (MDLP, the to the connection attribute data in the sample data of known class using minimum Minimum Description Length Principle) carry out discretization;
Step 3, acquisition region to be predicted bump attribute data, and by connection attribute data therein and phase in step 2 It answers attribute data to be compared, connection attribute data institute in bump attribute data in region to be predicted is determined according to comparison result Sequence of intervals, to by the connection attribute Data Discretization in bump attribute data in region to be predicted;
The k adjacent with sample to be predicted is searched in step 4, the discretization d ataset generated from step 2 using k nearest neighbor algorithm Sample, the training set of C4.5 decision trees is made of k sample, and calculates the weight of sample in training set;
Step 5:According to the information gain-ratio of all properties in the weight calculation training set of sample data in training set, in root node In the generating process of other each branch nodes, the maximum attribute of information gain-ratio in each secondary iterative process is selected to determine as C4.5 The Split Attribute of root node and other each branch nodes in plan tree;
Step 6:Decision tree is established according to Split Attribute, beta pruning, beta pruning process are then carried out to decision tree using pessimistic beta pruning method It is middle that the error rate that number of samples calculates branch node and corresponding leaf node is replaced using sample weights;The finally decision by generating Tree presses danger classes to be predicted with treating the potential impact of estimation range.
2. the bump danger classes prediction technique according to claim 1 based on local weighted C4.5 algorithms, special Sign is:Described in step 2 carry out discretization specific method be:
Step 2.1:By wait for discretization one group of continuous property and its respective classes according to continuous property from small to large suitable Sequence is ranked up;
Step 2.2:The difference of classification corresponding to the continuous property after sequence selects continuous property as boundary Point constitutes boundary point set;If different classes of corresponding attribute value is identical, the attribute value corresponding to minimum classification is selected As separation;
Step 2.3:The information gain for calculating all separations in boundary point set selects the separation of information gain minimum, and Judge whether the separation meets minimum description criterion and retain the separation if met;Otherwise, remove the separation;
The calculation formula of the information gain of the separation is as follows:
Gain (a)=H (C)-H (C | a)
Wherein, a is the separation that separation is concentrated, and H (C) is classification information entropy, and H (C | a) is separation a by category set C is divided into the comentropy after two subsets;
If aminIt is the separation of information gain minimum, category set C is divided into two subset C1And C2, judge aminWhether The calculation formula for meeting minimum description criterion is as follows:
Gain(amin) > log2(N-1)/N+log2(3k′- 2)-[k ' H (C)-k '1H(C1)-k′2H(C2)]
Wherein, k '1、k′2Respectively subset C1And C2In included categorical measure;
Step 2.4:It is whether also other in two sequence of intervals that separation in judgment step 2.3 is divided original data set Separation, if so, then the separation in each sequence of intervals reformulates corresponding boundary point set and return to step 2.3, root Continue to judge whether each sequence of intervals retains corresponding separation according to the quantity and respective classes set of sample in sequence of intervals, otherwise Execute step 2.5;
Step 2.5:According to the boundary point set of final choice, sequence of intervals division is carried out to connection attribute data, if finally There is no separation to meet minimum description criterion, then all connection attribute data are divided into a sequence of intervals in the attribute, otherwise Connection attribute data are divided into different sequence of intervals by separation, obtain the discretization results of connection attribute data;
Step 2.6:Whether the connection attribute in judgement sample data set has carried out discretization, if so, 3 are thened follow the steps, Otherwise, step 2.1- steps 2.5 are repeated, all connection attributes of sample data set are subjected to discretization.
3. the bump danger classes prediction technique according to claim 1 based on local weighted C4.5 algorithms, special Sign is:The specific method of the weight of sample is in training set described in step 4:
The weight of sample is calculated according to following formula in training set:
Wherein, ωiFor the weight of i-th in the training set sample adjacent with sample to be predicted, i=1,2 ..., k, diIt is to be predicted Sample is to i-th of sample data xiDistance, which uses the attribute data of sample, and is calculated according to range formula, dmaxFor the maximum value of the distance of all samples in sample to training set to be predicted.
4. the bump danger classes prediction technique according to claim 3 based on local weighted C4.5 algorithms, special Sign is:The specific method of the information gain-ratio of attribute is in calculating training set described in step 5
If V is an attribute in training set, vjFor j-th of attribute value in attribute V, j=1,2 ..., m, m is sample in training set The mutual misaligned attribute value number of the attribute V of notebook data, the category set in training set corresponding to sample data be C '= {c1、c2、…、cn, wherein ci′For the i-th ' a classification, i '=1,2 ..., n, n is classification corresponding to sample data in training set Sum, the specific method for calculating the information gain-ratio of attribute in training set are:
The classification information entropy for calculating sample data in training set, is shown below:
Wherein,It is c for sample class in training seti′Sample weight and, ωC′For the sample of all categories in training set Weight and, p (ci′) it is that classification is c in training seti′Sample weight andWith the weight and ω of the sample of all categoriesC′ Ratio;
The class condition entropy for calculating sample data in training set, is shown below:
Wherein,It is v for attribute valuejSample weight and, ωVFor all samples in attribute V weight and,It indicates to belong to Property value be vjSample in belong to ci′The sum of sample weights of class, p (vj) it is that attribute value is v in training setjSample power Weight and with the weight of all samples and ratio, p (ci′|vj) be attribute value it is vjSample in classification be ci′Sample power It is again and with all properties value vjSample weight sum ratio;
The information gain for calculating the attribute V of sample data in training set, is shown below:
I (C ', V)=I (C ')-I (C ' | V)
The comentropy for calculating the attribute V of sample data in training set, is shown below:
The information gain-ratio for calculating the attribute V of sample data in training set, is shown below:
Gain_radio (V)=I (C ', V)/I (V).
CN201810058598.8A 2018-01-22 2018-01-22 Rock burst danger level prediction method based on local weighted C4.5 algorithm Active CN108280289B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810058598.8A CN108280289B (en) 2018-01-22 2018-01-22 Rock burst danger level prediction method based on local weighted C4.5 algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810058598.8A CN108280289B (en) 2018-01-22 2018-01-22 Rock burst danger level prediction method based on local weighted C4.5 algorithm

Publications (2)

Publication Number Publication Date
CN108280289A true CN108280289A (en) 2018-07-13
CN108280289B CN108280289B (en) 2021-10-08

Family

ID=62804465

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810058598.8A Active CN108280289B (en) 2018-01-22 2018-01-22 Rock burst danger level prediction method based on local weighted C4.5 algorithm

Country Status (1)

Country Link
CN (1) CN108280289B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110175194A (en) * 2019-04-19 2019-08-27 中国矿业大学 A kind of coal mine roadway surrouding rock deformation rupture discrimination method based on association rule mining
CN111764963A (en) * 2020-07-06 2020-10-13 中国矿业大学(北京) Rock burst prediction method based on fast-RCNN
CN113901939A (en) * 2021-10-21 2022-01-07 黑龙江科技大学 Rock burst danger level prediction method based on fuzzy correction, storage medium and equipment
CN114780443A (en) * 2022-06-23 2022-07-22 国网数字科技控股有限公司 Micro-service application automatic test method and device, electronic equipment and storage medium
CN117557087A (en) * 2023-09-01 2024-02-13 广州市河涌监测中心 Drainage unit risk prediction model training method and system based on water affair data

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1655236A (en) * 2000-04-24 2005-08-17 高通股份有限公司 Method and apparatus for predictively quantizing voiced speech
CN102473247A (en) * 2009-06-30 2012-05-23 陶氏益农公司 Application of machine learning methods for mining association rules in plant and animal data sets containing molecular genetic markers, followed by classification or prediction utilizing features created from these association rules
WO2013075104A1 (en) * 2011-11-18 2013-05-23 Rutgers, The State University Of New Jersey Method and apparatus for detecting granular slip
CN105373606A (en) * 2015-11-11 2016-03-02 重庆邮电大学 Unbalanced data sampling method in improved C4.5 decision tree algorithm
CN106096748A (en) * 2016-04-28 2016-11-09 武汉宝钢华中贸易有限公司 Entrucking forecast model in man-hour based on cluster analysis and decision Tree algorithms
CN106250986A (en) * 2015-06-04 2016-12-21 波音公司 Advanced analysis base frame for machine learning
CN107145998A (en) * 2017-03-31 2017-09-08 中国农业大学 A kind of soil calculation of pressure method and system based on Dyna CLUE models

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1655236A (en) * 2000-04-24 2005-08-17 高通股份有限公司 Method and apparatus for predictively quantizing voiced speech
CN102473247A (en) * 2009-06-30 2012-05-23 陶氏益农公司 Application of machine learning methods for mining association rules in plant and animal data sets containing molecular genetic markers, followed by classification or prediction utilizing features created from these association rules
WO2013075104A1 (en) * 2011-11-18 2013-05-23 Rutgers, The State University Of New Jersey Method and apparatus for detecting granular slip
CN106250986A (en) * 2015-06-04 2016-12-21 波音公司 Advanced analysis base frame for machine learning
CN105373606A (en) * 2015-11-11 2016-03-02 重庆邮电大学 Unbalanced data sampling method in improved C4.5 decision tree algorithm
CN106096748A (en) * 2016-04-28 2016-11-09 武汉宝钢华中贸易有限公司 Entrucking forecast model in man-hour based on cluster analysis and decision Tree algorithms
CN107145998A (en) * 2017-03-31 2017-09-08 中国农业大学 A kind of soil calculation of pressure method and system based on Dyna CLUE models

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HSSINA B等: "A comparative study of decision tree ID3 and C4.5", 《INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE APPLICATIONS》 *
刘阳等: "基于决策树的冲击地压危险等级预测", 《内蒙古煤炭经济》 *
李琳: "基于决策树的数据挖掘方法在化学模式分类中的应用", 《中国优秀硕士学位论文全文数据库 工程科技Ⅰ辑》 *
王彦彬等: "局部加权随机森林的冲击地压危险性等级预测", 《辽宁工程技术大学学报(自然科学版)》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110175194A (en) * 2019-04-19 2019-08-27 中国矿业大学 A kind of coal mine roadway surrouding rock deformation rupture discrimination method based on association rule mining
CN110175194B (en) * 2019-04-19 2021-02-02 中国矿业大学 Coal mine roadway surrounding rock deformation and fracture identification method based on association rule mining
CN111764963A (en) * 2020-07-06 2020-10-13 中国矿业大学(北京) Rock burst prediction method based on fast-RCNN
CN113901939A (en) * 2021-10-21 2022-01-07 黑龙江科技大学 Rock burst danger level prediction method based on fuzzy correction, storage medium and equipment
CN114780443A (en) * 2022-06-23 2022-07-22 国网数字科技控股有限公司 Micro-service application automatic test method and device, electronic equipment and storage medium
CN117557087A (en) * 2023-09-01 2024-02-13 广州市河涌监测中心 Drainage unit risk prediction model training method and system based on water affair data

Also Published As

Publication number Publication date
CN108280289B (en) 2021-10-08

Similar Documents

Publication Publication Date Title
CN108280289A (en) Bump danger classes prediction technique based on local weighted C4.5 algorithms
CN104732070B (en) A kind of rock burst grade prediction technique based on information vector machine
CN108226889A (en) A kind of sorter model training method of radar target recognition
CN103617147A (en) Method for identifying mine water-inrush source
CN106980877A (en) A kind of Prediction of Blasting Vibration method based on particle cluster algorithm Support Vector Machines Optimized
CN107122860B (en) Rock burst danger level prediction method based on grid search and extreme learning machine
Lin et al. Machine learning templates for QCD factorization in the search for physics beyond the standard model
CN106897821A (en) A kind of transient state assesses feature selection approach and device
CN106529667A (en) Logging facies identification and analysis method based on fuzzy depth learning in big data environment
CN107194524A (en) A kind of Coal and Gas Outbursts Prediction method based on RBF neural
Saeidi et al. Prediction of the rock mass diggability index by using fuzzy clustering-based, ANN and multiple regression methods
CN105893876A (en) Chip hardware Trojan horse detection method and system
CN108267724A (en) A kind of unknown object recognition methods of radar target recognition
CN115130375A (en) Rock burst intensity prediction method
Tso et al. An ANN-based multilevel classification approach using decomposed input space for transient stability assessment
CN108268460A (en) A kind of method for automatically selecting optimal models based on big data
CN102200981A (en) Feature selection method and feature selection device for hierarchical text classification
Nikafshan Rad et al. Modification of rock mass rating system using soft computing techniques
CN117272841B (en) Shale gas dessert prediction method based on hybrid neural network
Berneti Design of fuzzy subtractive clustering model using particle swarm optimization for the permeability prediction of the reservoir
CN117540303A (en) Landslide susceptibility assessment method and system based on cross semi-supervised machine learning algorithm
CN105550711A (en) Firefly algorithm based selective ensemble learning method
CN111985782A (en) Automatic tramcar driving risk assessment method based on environment perception
CN105512726A (en) Reliability distribution method and apparatus based on immune genetic optimization
CN114462323A (en) Oil reservoir flow field characterization method based on multi-attribute field fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant