CN109685321A - Event risk method for early warning, electronic equipment and medium based on data mining - Google Patents
Event risk method for early warning, electronic equipment and medium based on data mining Download PDFInfo
- Publication number
- CN109685321A CN109685321A CN201811431329.8A CN201811431329A CN109685321A CN 109685321 A CN109685321 A CN 109685321A CN 201811431329 A CN201811431329 A CN 201811431329A CN 109685321 A CN109685321 A CN 109685321A
- Authority
- CN
- China
- Prior art keywords
- data
- event
- feature
- index
- assailant
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000007418 data mining Methods 0.000 title claims abstract description 12
- 238000000513 principal component analysis Methods 0.000 claims abstract description 10
- 230000010354 integration Effects 0.000 claims abstract description 8
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 238000000605 extraction Methods 0.000 claims description 12
- 238000010606 normalization Methods 0.000 claims description 12
- 238000012216 screening Methods 0.000 claims description 10
- 238000006243 chemical reaction Methods 0.000 claims description 9
- 239000011159 matrix material Substances 0.000 claims description 7
- 239000000284 extract Substances 0.000 claims description 5
- 238000003860 storage Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000003247 decreasing effect Effects 0.000 claims description 4
- 230000000875 corresponding effect Effects 0.000 description 10
- 238000004458 analytical method Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 6
- 230000009467 reduction Effects 0.000 description 5
- 238000009826 distribution Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000002411 adverse Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000004140 cleaning Methods 0.000 description 2
- 230000001186 cumulative effect Effects 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 2
- 238000000556 factor analysis Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000000540 analysis of variance Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000001066 destructive effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004615 ingredient Substances 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 231100000518 lethal Toxicity 0.000 description 1
- 230000001665 lethal effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000002203 pretreatment Methods 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
- 230000009885 systemic effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
- G06Q50/265—Personal security, identity or safety
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Human Resources & Organizations (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Tourism & Hospitality (AREA)
- Economics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Artificial Intelligence (AREA)
- Strategic Management (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Business, Economics & Management (AREA)
- Educational Administration (AREA)
- Entrepreneurship & Innovation (AREA)
- Development Economics (AREA)
- Marketing (AREA)
- Computer Security & Cryptography (AREA)
- Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- General Health & Medical Sciences (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses event risk method for early warning, electronic equipment and media based on data mining, obtain the record data of the history attack of terrorism and event to be tested;To the record number of the history attack of terrorism and event to be tested according to pre-processing;Classified using clustering algorithm to the data obtained after data prediction, if being divided into Ganlei's data;Several features are extracted from every a kind of data using Principal Component Analysis Algorithm;Feature integration is carried out to all features that all class data are extracted and obtains characteristic set;Several index features are extracted from characteristic set using Principal Component Analysis Algorithm;The weight of each index feature is calculated using improved entropy assessment;It is weighted for the characteristic value of each index feature of each event using corresponding weight, to calculated result according to being ranked up from big to small, sequence serial number of the event to be tested in all sequences is exported as a result, if sequence serial number is less than given threshold, is issued warning signal.
Description
Technical field
This disclosure relates to the field of data mining, more particularly to event risk method for early warning, electronics based on data mining
Equipment and medium.
Background technique
The statement of this part is only to improve background technique relevant to the disclosure, not necessarily constitutes the prior art.
Currently, the attack of terrorism refer to it is that extremist or tissue artificially manufacture, be directed to but be not limited only to the common people and civilian set
Attack applying, not meeting international morality and justice, it not only has great lethal and destructive power, can directly contribute huge
Casualties and property loss, but also huge psychological pressure is brought, cause society a degree of turbulent,
Normal work and orders of life are interfered, and then is greatly hindered the development of the economy.
Common classification generally uses subjective method, selects several main indicators by authoritative organization or department, forces rule
Determine grade scale, but the harmfulness of the attack of terrorism depends not only on casualties and economic loss the two aspects, also with
Opportunity of generation, region, the object that is directed to etc. factors are related, thus are difficult to form unified mark using above-mentioned stage division
It is quasi-.With the emergence of the attack of terrorism, data mining can be carried out according to its feature, objectively to the terrified thing of progress
The quantization modulation of part, this is an important process, makes specific aim measure for relevant departments and provides objective basis.
In conclusion lacking precisely the attack of terrorism with quick method for prewarning risk, still lack effective solution
Certainly scheme.
Summary of the invention
In order to solve the deficiencies in the prior art, present disclose provides event risk method for early warning, electricity based on data mining
Sub- equipment and medium, have based on improve entropy assessment model to the attack of terrorism carry out risk precisely and quick early warning.
In a first aspect, present disclose provides the event risk method for early warning based on data mining;
Event risk method for early warning based on data mining, comprising:
Data acquisition step: the record data of the history attack of terrorism and event to be tested are obtained;Each event is set
There is unique number;The record data, comprising: area, attack type, the property loss amount of money, injured sum, dead sum,
The solution date of assailant's quantity, the assailant's quantity arrested, assailant's death toll, event summary, hostage's kidnapping result or event;
Data prediction step: to the record number of the history attack of terrorism and event to be tested according to pre-processing;
Data-classification step: classified using clustering algorithm to the data obtained after data prediction, be divided into several
Class data;
Extraction step of feature: using Principal Component Analysis Algorithm, extracts several features from every a kind of data;
Feature integration step: feature integration is carried out to all features that all class data are extracted, obtains characteristic set;
Feature second extraction: Principal Component Analysis Algorithm is used, several index features are extracted from characteristic set;
Feature weight obtaining step: the weight of each index feature is calculated using improved entropy assessment;
Risk-warning step: for the characteristic value of each index feature of each event, added using corresponding weight
Power calculates, to calculated result according to being ranked up from big to small, using sequence serial number of the event to be tested in all sequences as
As a result it exports, if sequence serial number is less than given threshold, issues warning signal.
As some possible implementations, the clustering algorithm is using system clustering algorithm.
As some possible implementations, the weight W of each index feature is calculated using improved entropy assessmenti:
Assuming that giving k index feature X1, X2..., Xk, wherein Xi={ x1, x2..., xn};xnRepresent different samples pair
The sampled data values answered;
Assuming that the sampled data values x of index featureiValue after standardization is Yij:
Wherein, min (Xi) indicate XiSampled data values minimum value;max(Xi) indicate XiSampled data values maximum
Value;
Secondly, seeking the comentropy E of each index featurej, j=1,2 ..., k;Assuming that there is k index feature, each index is special
Levy corresponding n sampled data values;
Wherein,If pij=0, then it defines
According to the calculation formula of comentropy, the comentropy for calculating k index is E1, E2..., Ek, then, it is determined that respectively referring to
Mark weight Wi:
As some possible implementations, the data prediction step, comprising: data screening sub-step, data are filled out
Fill sub-step, data conversion sub-step and data normalization sub-step;
The data screening sub-step, the solution date for kidnapping result and event to event summary, hostage reject;
The data fill sub-step, assailant's quantity that the attack of terrorism occurs, assailant's number death sum, arrested
Amount, injured sum, dead sum, assailant's death toll and property loss amount record missing values are filled, for unknown number
According to progress zero padding;
The data conversion sub-step, the area that the attack of terrorism occurs, attack type, is converted by text data
Numerical data;
The step of regional text data is converted into numerical data are as follows: by the death sum of the corresponding event in each area and act of violence
Hand quantity is summed, successively right according to sequence from big to small after sequence to summed result according to being ranked up from big to small
Area carries out digital marking, and number marking is successively successively decreased.
The step of attack type text data is converted into numerical data are as follows: every kind of attack type is corresponded to the death of event
Sum and assailant's quantity are summed, suitable according to from big to small after sequence to summed result according to being ranked up from big to small
Sequence successively carries out digital marking to attack type, and number marking is successively successively decreased.
The data normalization sub-step uses the data being converted to by data screening, data filling and data
Minimax normalization algorithm is normalized, and according to the data after normalized, establishes N*1 for each event
Matrix, N indicates the number of data, and the value of each element is the knot after the corresponding numerical value normalization of each record data in matrix
Fruit.
Second aspect, present disclose provides a kind of electronic equipment;
A kind of electronic equipment, comprising: the meter that memory, processor and storage are run on a memory and on a processor
The instruction of calculation machine, when the computer instruction is run by processor, completes step described in any of the above-described method.
The third aspect, present disclose provides a kind of computer readable storage mediums;
A kind of computer readable storage medium, operation has computer instruction thereon, and the computer instruction is transported by processor
When row, step described in any of the above-described method is completed.
Compared with prior art, the beneficial effect of the disclosure is:
In the way of traditional data prediction, data format is converted by content of text, the availability of data is improved, adds
The accuracy of strong model;The grouping of feature is realized in the way of cluster, mitigates the difficulty and error of high latitude dimensionality reduction.
The optimization processing that feature is realized in the way of the Fusion Features based on Principal Component Analysis, compared to other existing skills
Art method, implementation method are more succinct effective;
Show that the weight progress score statistics of each index can compared to traditional entropy assessment using improved entropy assessment
It is determined by enhancing the accurate precision of weight.
Detailed description of the invention
The accompanying drawings constituting a part of this application is used to provide further understanding of the present application, and the application's shows
Meaning property embodiment and its explanation are not constituted an undue limitation on the present application for explaining the application.
Fig. 1 is the information flow schematic diagram of one or more embodiments;
Fig. 2 is KMO and the Bartlett verification result of the attack of terrorism data of one or more embodiments;
Fig. 3 is the communality figure of one or more embodiments;
Fig. 4 is that the variance of the explanation of one or more embodiments amounts to figure;
Fig. 5 is the rotation component matrix figure of one or more embodiments;
Fig. 6 is the entropy assessment score distribution map of one or more embodiments.
Specific embodiment
It is noted that following detailed description is all illustrative, it is intended to provide further instruction to the application.Unless another
It indicates, all technical and scientific terms used herein has usual with the application person of an ordinary skill in the technical field
The identical meanings of understanding.
It should be noted that term used herein above is merely to describe specific embodiment, and be not intended to restricted root
According to the illustrative embodiments of the application.As used herein, unless the context clearly indicates otherwise, otherwise singular
Also it is intended to include plural form, additionally, it should be understood that, when in the present specification using term "comprising" and/or " packet
Include " when, indicate existing characteristics, step, operation, device, component and/or their combination.
Embodiment 1:
As shown in Figure 1, the event risk method for early warning based on data mining, comprising:
Data acquisition step: the record data of the history attack of terrorism and event to be tested are obtained;Each event is set
There is unique number;The record data, comprising: area, attack type, the property loss amount of money, injured sum, dead sum,
The solution date of assailant's quantity, the assailant's quantity arrested, assailant's death toll, event summary, hostage's kidnapping result or event;
Data prediction step: to the record number of the history attack of terrorism and event to be tested according to pre-processing,
Pre-treatment step includes: data screening, data conversion, data filling;
It is unrelated to the invention to kidnap solution date of result and event etc. to event summary, hostage for data screening step
Data are rejected;
Data conversion step, the features such as area, attack type that the attack of terrorism occurs, is converted by text data
Number.Conversion regime uses ten point system.First according to area or the features such as attack type institute to death sum determine correlation
The significance level in area;
Assuming that the event number in the area R is num, the corresponding dead sum of corresponding event is nkilli, i=1,2 ...,
Num, N are the overall dead sum of all events in sample, then the final score S of this area is formula 5.
Data filling step, assailant's quantity that the attack of terrorism is occurred, dead sum, the assailant's quantity arrested, by
Wound sum, dead sum, assailant's death toll and property loss amount record missing values are filled, according to the ratio of missing values
It is filled, 95% feature is higher than for miss rate, is directly rejected, 95% feature is lower than for miss rate, use
Unknown data is carried out to the mode of zero padding;
Data normalizing steps: being normalized pretreated data, for using the attack of terrorism
The record data of the attack of terrorism after screening are normalized in the maximum value and minimum value for recording data, so that in advance
The data of processing are defined ([0,1]) in a certain range, to eliminate adverse effect caused by unusual sample data.
Data-classification step: classifying to the data obtained after normalization using clustering algorithm, be divided into 4 class data,
Wherein all features are divided into four classes, wherein first group of feature includes: dead sum, injured sum.Second group of feature includes: act of violence
Hand quantity, the assailant's quantity arrested, assailant's death toll.Third group feature includes: the property loss amount of money.4th group of feature packet
It includes: area, attack type.
Extraction step of feature: using Principal Component Analysis Algorithm, extracts N from every a kind of dataiA feature;
Feature integration step: all N that all class data are extractediA feature carries out feature integration, obtains containing N number of spy
The characteristic set of sign;
Feature second extraction: Principal Component Analysis Algorithm is used, main is extracted from characteristic setiIndex feature, i=1,
2,3;
Feature weight obtaining step: the weight of each index feature is calculated using improved entropy assessment;
Risk-warning step: for the characteristic value of each index feature of each event, added using corresponding weight
Power calculates, to calculated result according to being ranked up from big to small, using sequence serial number of the event to be tested in all sequences as
As a result it exports, if sequence serial number is less than given threshold, issues warning signal.
The purpose of the present embodiment is to provide the event risk method for early warning based on data mining, and step includes:
(1) data processing is carried out to the sample data of acquisition:
Step 1: missing values clean.Its missing values ratio is calculated, determines the range of missing values.According to missing ratio and word
Section importance, takes different processing strategies.The feature high for importance, miss rate is low, is filled.
Step 2: carrying out Data Format Transform: since certain features correspond to the features such as text type, such as area, to solution
Certainly problem has certain importance, so text is carried out digital conversion.It is not aligned to importing partially to arrange existing for data
The problem of, and the case where have more column, optimize processing.
Step 3: carrying out non-demand data cleaning.Event summary, hostage in data kidnap the solution day of result and event
Phase etc. belongs to non-demand data, therefore it is directly deleted.
Step 4: the data after cleaning are normalized, for the record data using the attack of terrorism
The record data of the attack of terrorism after screening are normalized in maximum value and minimum value, so that pretreated data
It is defined ([0,1]) in a certain range, to eliminate adverse effect caused by unusual sample data.
(2) data classification:
Using systemic clustering, data characteristics after pretreatment is divided into inhomogeneity and carries out feature extraction.Specifically, this hair
Bright to use farthest neighbors clustering procedure, module is with pearson correlation demarcation interval.All features are divided into four classes,
In first group of feature include: dead sum, injured sum.Second group of feature include: assailant's quantity, arrests assailant's quantity, act of violence
Hand death toll.Third group feature includes: the property loss amount of money.4th group of feature includes: area, attack type.
(3) feature is once extracted:
It carries out feature to every group of data respectively using Principal Component Analysis once to extract, every group obtains NiA different spy
Sign.
(4) feature integration:
Feature N after every group of data are once extractediIt is integrated, obtains characteristic set.
(5) feature second extraction:
The principal component signature analysis includes partial correlation inspection and factorial analysis;The partial correlation is examined, for examining
Look into the partial correlation between attack of terrorism relative recording data;The factorial analysis, according to above-mentioned partial correlation, using because
Sub- analytic approach carries out decorrelation, winner's composition characteristics, respectively main to the record data of the attack of terrorism1, main2,
main3。
Principal component feature is obtained using factor-analysis approach, partial correlation specifically is carried out to 4 tested features first
It examines.Specifically, the present invention is examined using KMO and Bartlett sphericity.Initial data degree of correlation is higher, more suitable use
Factor analysis is analyzed.The value of KMO shows that original variable correlation is weaker closer to 0;The value of KMO closer to 1,
Then show that original variable correlation is stronger.And Bartlett sphericity test statistics mainly sees that its conspicuousness, conspicuousness are low
Then show that data distribution for spherical distribution, has construction validity between variable when 0.05, it was demonstrated that initial data be appropriate for because
Son analysis.It is as shown in Figure 2 to analyze result.As it can be seen that the conspicuousness of KMO=0.793 > 0.5 and Bartlett are 0 less than 0.05, say
There is significant correlation between bright characteristic variable, be appropriate for factorial analysis.Communality (shown in Fig. 3), reflects information
The loss amount (1- extraction degree) of extraction degree ((extraction of values/initial value)/100) and information.Initial value and extraction of values are compared, it can
To find out the loss amount of information.
In order to further determine the number of principal component feature, the present invention is to original 4 feature Main1, Main2, Main3,
Main4Carry out factorial analysis, obtain illustrating square margin total figure, as shown in figure 4, wherein comprising 4 feature initial characteristic values and
Variance contribution ratio, and extract the characteristic value and variance contribution ratio of 3 principal components.Principle according to characteristic value greater than 1 can mention
Take out 3 principal components.This 3 principal components illustrate variance: cumulative proportion in ANOVA reaches 92.911% > 85%, analyzes in this way
The main gene come is satisfactory, can be used to training pattern.The present invention further obtains the rotation component matrix of 4 features, such as
Shown in Fig. 5.Can intuitively reflect which primitive character has been classified as same constituents and initial characteristics are had in ingredient
Some magnitudes of load.
Then factorial analysis is carried out to this 4 features, factorial analysis is specifically carried out using dimensionality reduction module, according to be achieved
Target, it is desirable that low-rank subspace has maximum separability to sample, therefore the present invention is quasi- to 4 index features progress dimensionality reductions, goes
Fall the multiple correlation between feature.
Mainly the realization process includes: to all samples normalizations;Seek the correlation matrix of sample;Spy is done to correlation matrix
Value indicative is decomposed;Take feature vector w corresponding to maximum d ' characteristic value1, w2..., wd′.Parameter d ' can pass through cross validation
Mode obtain, can also be with given threshold τ, choosing makes formula (6) to set up the smallest, wherein λi, λjIt is characteristic value.This hair
Bright given threshold is τ=0.85.I, j are cumulative and control variable, i=1, and 2 ..., d ', j=1,2 ..., d.
Finally extract 3 principal component feature main1, main2, main3。
Wherein λi, λjIt is characteristic value.Obviously, lower dimensional space and original higher dimensional space must be different, because having given up minimum
The corresponding feature vector of a characteristic value of d-d ', this is the result of dimensionality reduction.But give up this partial information to be necessary, one side energy
Increase the sampling density of sample, this is exactly the purpose of dimensionality reduction;On the other hand, there is the effect of denoising to a certain extent
Fruit, because feature vector corresponding to the smallest characteristic value is often related with noise.
(6) it improves entropy assessment and determines weight
Objective weight is determined according to the size of index variability.In general, if the comentropy E of some indexjIt is smaller,
Show that index value obtains that degree of variation is bigger, the information content provided is more, can play the role of in overall merit it is also bigger,
Weight is also bigger.On the contrary, the comentropy E of some indexjIt is bigger, show that index value obtains the information that degree of variation is smaller, provides
Amount is also fewer, and the effect played in overall merit is also smaller, and weight is also just smaller.
Firstly, obtaining each finger target value Y by data normalizationk, initial data is carried out by normalizing by data normalization
Change processing, unified conversion is between 0-1.Assuming that given k index X1, X2..., Xk, wherein Xi{x1, x2..., xnVacation
If being Y to the value after the standardization of each achievement dataij。
Secondly, seeking the comentropy of each index.Assuming that there is k index feature, each index feature corresponds to n sample data
Value.According to the definition of comentropy in information theory, the comentropy E of one group of datajFor formula 8
WhereinIf pij=0, then it defines
Then, it is determined that each index weights.According to the calculation formula of comentropy, the comentropy for calculating each index is E1,
E2..., Ek.The comentropy of index is smaller, it includes content it is more.Conversely, fewer.In general, comentropy it is smaller its
Weight is bigger.If it is desired to further strengthening the significance level of index, can be determined by enhancing the accurate precision of weight.Therefore
Improved entropy assessment is formula 9
Finally, scoring each feature.Three correlated characteristics chosen are as follows: area, attack type and property loss
The amount of money.If ZlFor the final score of the 1st event, thenScore distribution histogram is as shown in Figure 6.By dividing
Three local minimum points of cloth histogram graph discovery, respectively n1, n2, n3.Therefore event can be divided into five ranks.Grading range
As shown in table 1.
1 grading range index of table
Grade | Rate range |
One rank | 0 |
Two ranks | 0~n1 |
Three ranks | n1~n2 |
Four ranks | n2~n3 |
Five scale | n3More than |
(7) method validation
" high score event " is used to be verified, discovery high score example all concentrates on preceding the 10% of score substantially, illustrates model
Substantially effectively.
The foregoing is merely preferred embodiment of the present application, are not intended to limit this application, for the skill of this field
For art personnel, various changes and changes are possible in this application.Within the spirit and principles of this application, made any to repair
Change, equivalent replacement, improvement etc., should be included within the scope of protection of this application.
Claims (8)
1. the event risk method for early warning based on data mining, characterized in that include:
Data acquisition step: the record data of the history attack of terrorism and event to be tested are obtained;Each event is designed with only
One number;The record data, comprising: area, attack type, the property loss amount of money, injured sum, dead sum, assailant
The solution date of quantity, the assailant's quantity arrested, assailant's death toll, event summary, hostage's kidnapping result or event;
Data prediction step: to the record number of the history attack of terrorism and event to be tested according to pre-processing;
Data-classification step: classified using clustering algorithm to the data obtained after data prediction, if being divided into Ganlei's number
According to;
Extraction step of feature: using Principal Component Analysis Algorithm, extracts several features from every a kind of data;
Feature integration step: feature integration is carried out to all features that all class data are extracted, obtains characteristic set;
Feature second extraction: Principal Component Analysis Algorithm is used, several index features are extracted from characteristic set;
Feature weight obtaining step: the weight of each index feature is calculated using improved entropy assessment;
Risk-warning step: for the characteristic value of each index feature of each event, meter is weighted using corresponding weight
It calculates, to calculated result according to being ranked up from big to small, as a result by sequence serial number of the event to be tested in all sequences
Output issues warning signal if sequence serial number is less than given threshold.
2. the method as described in claim 1, characterized in that the clustering algorithm is using system clustering algorithm.
3. the method as described in claim 1, characterized in that calculate the weight W of each index feature using improved entropy assessmenti:
Assuming that giving k index feature X1,X2,…,Xk, wherein Xi={ x1,x2,…,xn};xnRepresent the corresponding sample of different samples
Notebook data value;
Assuming that the sampled data values x of index featureiValue after standardization is Yij:
Wherein, min (Xi) indicate XiSampled data values minimum value;max(Xi) indicate XiSampled data values maximum value;
Secondly, seeking the comentropy E of each index featurej, j=1,2 ..., k;Assuming that have k index feature, each index feature pair
Answer n sampled data values;
Wherein,If pij=0, then it defines
According to the calculation formula of comentropy, the comentropy for calculating k index is E1,E2,…,Ek, then, it is determined that each index is weighed
Weight Wi:
4. the method as described in claim 1, characterized in that the data prediction step, comprising: data screening sub-step,
Data fill sub-step, data conversion sub-step and data normalization sub-step;
The data screening sub-step, the solution date for kidnapping result and event to event summary, hostage reject;
The data fill sub-step, assailant's quantity that the attack of terrorism is occurred, dead sum, the assailant's quantity arrested,
Injured sum, dead sum, assailant's death toll and property loss amount record missing values are filled, for unknown data into
Row zero padding;
The data conversion sub-step, the area that the attack of terrorism occurs, attack type, is converted into number by text data
Data;
The data normalization sub-step, to the data being converted to by data screening, data filling and data, using maximum
Minimum normalization algorithm is normalized, and according to the data after normalized, the square of N*1 is established for each event
Battle array, N indicate the number of data, and the value of each element is the result after the corresponding numerical value normalization of each record data in matrix.
5. method as claimed in claim 4, characterized in that
The step of regional text data is converted into numerical data are as follows: by the death sum and assailant's number of the corresponding event in each area
Amount is summed, to summed result according to being ranked up from big to small, after sequence, according to sequence from big to small successively to area
Digital marking is carried out, number marking is successively successively decreased.
6. method as claimed in claim 4, characterized in that
The step of attack type text data is converted into numerical data are as follows: every kind of attack type is corresponded to the death sum of event
Sum with assailant's quantity, to summed result according to being ranked up from big to small, after sequence, according to sequence from big to small according to
Secondary to carry out digital marking to attack type, number marking is successively successively decreased.
7. a kind of electronic equipment, characterized in that include: memory, processor and storage on a memory and on a processor
The computer instruction of operation when the computer instruction is run by processor, completes any one of the claims 1-6 method institute
The step of stating.
8. a kind of computer readable storage medium, characterized in that operation has computer instruction thereon, and the computer instruction is located
When managing device operation, step described in any one of the claims 1-6 method is completed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811431329.8A CN109685321A (en) | 2018-11-26 | 2018-11-26 | Event risk method for early warning, electronic equipment and medium based on data mining |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811431329.8A CN109685321A (en) | 2018-11-26 | 2018-11-26 | Event risk method for early warning, electronic equipment and medium based on data mining |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109685321A true CN109685321A (en) | 2019-04-26 |
Family
ID=66185619
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811431329.8A Pending CN109685321A (en) | 2018-11-26 | 2018-11-26 | Event risk method for early warning, electronic equipment and medium based on data mining |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109685321A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110348510A (en) * | 2019-07-08 | 2019-10-18 | 中国海洋石油集团有限公司 | A kind of data preprocessing method based on deep water hydrocarbon drilling process conditions of the current stage |
CN112465533A (en) * | 2019-09-09 | 2021-03-09 | ***通信集团河北有限公司 | Intelligent product selection method and device and computing equipment |
CN112907035A (en) * | 2021-01-27 | 2021-06-04 | 厦门卫星定位应用股份有限公司 | K-means-based transportation subject credit rating method and device |
CN113537691A (en) * | 2021-05-09 | 2021-10-22 | 武汉兴得科技有限公司 | Big data public health event emergency command method and system |
CN116596353A (en) * | 2022-09-29 | 2023-08-15 | 中国人民解放军空军工程大学 | Quantitative analysis method for terrorist attack event record data |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101673280A (en) * | 2009-07-20 | 2010-03-17 | 浙江大学 | Method for determining terror attack organization based on feature mining of terror attack event |
CN105956982A (en) * | 2016-05-04 | 2016-09-21 | 江苏大学 | Method of predicting act of terror based on background change |
CN106570767A (en) * | 2016-10-26 | 2017-04-19 | 中国农业科学院农业质量标准与检测技术研究所 | Monitoring data statistics analysis method and device in risk monitoring information system |
CN106776884A (en) * | 2016-11-30 | 2017-05-31 | 江苏大学 | A kind of act of terrorism Forecasting Methodology that multi-categorizer is combined based on multi-tag |
CN108776817A (en) * | 2018-06-04 | 2018-11-09 | 孟玺 | The type prediction method and system of the attack of terrorism |
-
2018
- 2018-11-26 CN CN201811431329.8A patent/CN109685321A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101673280A (en) * | 2009-07-20 | 2010-03-17 | 浙江大学 | Method for determining terror attack organization based on feature mining of terror attack event |
CN105956982A (en) * | 2016-05-04 | 2016-09-21 | 江苏大学 | Method of predicting act of terror based on background change |
CN106570767A (en) * | 2016-10-26 | 2017-04-19 | 中国农业科学院农业质量标准与检测技术研究所 | Monitoring data statistics analysis method and device in risk monitoring information system |
CN106776884A (en) * | 2016-11-30 | 2017-05-31 | 江苏大学 | A kind of act of terrorism Forecasting Methodology that multi-categorizer is combined based on multi-tag |
CN108776817A (en) * | 2018-06-04 | 2018-11-09 | 孟玺 | The type prediction method and system of the attack of terrorism |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110348510A (en) * | 2019-07-08 | 2019-10-18 | 中国海洋石油集团有限公司 | A kind of data preprocessing method based on deep water hydrocarbon drilling process conditions of the current stage |
CN110348510B (en) * | 2019-07-08 | 2021-08-03 | 中国海洋石油集团有限公司 | Data preprocessing method based on staged characteristics of deepwater oil and gas drilling process |
CN112465533A (en) * | 2019-09-09 | 2021-03-09 | ***通信集团河北有限公司 | Intelligent product selection method and device and computing equipment |
CN112907035A (en) * | 2021-01-27 | 2021-06-04 | 厦门卫星定位应用股份有限公司 | K-means-based transportation subject credit rating method and device |
CN112907035B (en) * | 2021-01-27 | 2022-08-05 | 厦门卫星定位应用股份有限公司 | K-means-based transportation subject credit rating method and device |
CN113537691A (en) * | 2021-05-09 | 2021-10-22 | 武汉兴得科技有限公司 | Big data public health event emergency command method and system |
CN116596353A (en) * | 2022-09-29 | 2023-08-15 | 中国人民解放军空军工程大学 | Quantitative analysis method for terrorist attack event record data |
CN116596353B (en) * | 2022-09-29 | 2024-06-04 | 中国人民解放军空军工程大学 | Quantitative analysis method for terrorist attack event record data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109685321A (en) | Event risk method for early warning, electronic equipment and medium based on data mining | |
Sun et al. | Predicting public procurement irregularity: An application of neural networks | |
CN109409677A (en) | Enterprise Credit Risk Evaluation method, apparatus, equipment and storage medium | |
CN102955902B (en) | Method and system for evaluating reliability of radar simulation equipment | |
CN104321794B (en) | A kind of system and method that the following commercial viability of an entity is determined using multidimensional grading | |
CN109657011A (en) | A kind of data digging method and system screening attack of terrorism criminal gang | |
CN109446812A (en) | A kind of embedded system firmware safety analytical method and system | |
CN112132233A (en) | Criminal personnel dangerous behavior prediction method and system based on effective influence factors | |
CN110309863A (en) | Evaluation method that a kind of identity based on analytic hierarchy process (AHP) and grey correlation analysis is credible | |
CN102880631A (en) | Chinese author identification method based on double-layer classification model, and device for realizing Chinese author identification method | |
AU2019101158A4 (en) | A method of analyzing customer churn of credit cards by using logistics regression | |
Chen et al. | Research on data mining combination model analysis and performance prediction based on students’ behavior characteristics | |
CN114358014A (en) | Work order intelligent diagnosis method, device, equipment and medium based on natural language | |
CN109582743A (en) | A kind of data digging method for the attack of terrorism | |
Ergu et al. | Predicting personality with twitter data and machine learning models | |
CN116340815A (en) | University abnormal behavior student identification method based on convolutional neural network | |
CN109214598A (en) | Batch ranking method based on K-MEANS and ARIMA model prediction residential quarters collateral risk | |
Işık et al. | Detection of fraudulent transactions using artificial neural networks and decision tree methods | |
Zhu et al. | Research on data mining of college students’ physical health for physical education reform | |
CN114862531A (en) | Enterprise financial risk early warning method and system based on deep learning | |
CN113920366A (en) | Comprehensive weighted main data identification method based on machine learning | |
Zhao et al. | An intelligent evaluation method to analyze the competitiveness of airlines | |
Cui et al. | Using PCA and ANN to identify significant factors and modeling customer satisfaction for the complex service processes | |
CN110209953A (en) | A kind of calculation method towards uncertain social computing problem | |
CN108629507A (en) | A kind of enterprise credit management system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190426 |