CN103150470A

CN103150470A - Visualization method for concept drift of data stream in dynamic data environment

Info

Publication number: CN103150470A
Application number: CN2013100520887A
Authority: CN
Inventors: 冯林; 姚远; 陈沣
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2013-02-18
Filing date: 2013-02-18
Publication date: 2013-06-12
Anticipated expiration: 2033-02-18
Also published as: CN103150470B

Abstract

The invention relates to the technical field of intelligent information processing, and discloses a visualization method for concept drift of a data stream in a dynamic data environment. The visualization method comprises the following steps: achieving static treatment on the data stream; establishing different concept presentation modes according to different concept drift modes and saving the different concept presentation modes in concept pools; and when a new data block comes, utilizing the KL divergence algorithm to search similar concept presentations in the concept pools, if the similar concept presentations exist, counting the similar concept presentations, and if not exist, adding the new data block into the concept pools to serve as a new concept to be saved. The visualization method can be used for detecting the changed drift types of various data streams, can fully analyze the concept drift process in the data stream through counting, finally, utilizes the Bays method to draw a concept drift and transition graph according to the statistic result, and visualizes the concept drift and transition graph for assisting data mining in a concept level.

Description

Data stream concept drift method for visualizing under a kind of dynamic data environment

Technical field

The present invention relates to the intelligent information processing technology field, particularly the visual method of data stream concept drift under a kind of dynamic environment, be applicable to network invasion monitoring, the aspects such as network security monitoring, sensing data monitoring and mains supply.

Background technology

Deep development along with infotech, the traditional data mining method is being faced with new challenge, what stand in the breach is exactly the variation of data mode, change dynamic data into by the traditional static data streamed, therefore how data stream is effectively excavated, extract the concern that the inner knowledge that comprises more and more is subject to industry member.

Different from static data, data stream itself has three characteristics: magnanimity, real-time and dynamic change.These three characteristics just require the traditional data mining model to adjust and to change, with variation and the characteristics that adapt to data stream, therefore at present much for data flow model and method, all that data attribute around data stream itself launches, for example data flow classification model, Clustering Model, dimensionality reduction model etc.But excavate for the concept aspect that comprises in data stream, still there is no corresponding biology barrier at present.

The technology of the data stream conceptual dependency of only depositing at present, mainly for there being the concept drift phenomenon in data stream, detect in real time or classify, for follow-up work provides support, and still belonging to blank for the concept drift visualization problem in academia and industry member.Remaining the exploratory stage at present although concept drift is visual, is the senior form of expression of data due to concept, therefore has great importance for understanding data and extracting the data knowledge method.Use for reference other method for visualizing, for example, stream shape figure, circle representation etc. mode obtaining under the prerequisite of concept characteristic, is carried out visual to concept drift, can be for follow-up work provide a representation intuitively, this will be conducive to carrying out smoothly and effectively of follow-up work.Exist the needs of dynamic environment data stream concept drift method for visualizing are provided in this area.

Summary of the invention

The objective of the invention is: for solving above-mentioned problems of the prior art, and for the deficiency of concept drift method for visualizing research, provide the method for visualizing of the data stream concept drift under a kind of dynamic environment.

For achieving the above object, the technical solution used in the present invention is: the method for visualizing of the data stream concept drift under a kind of dynamic data environment is provided, has specifically comprised the following steps:

Step 1: dynamic dataflow collection module 102 is collected data according to time sequencing from magnanimity real-time stream 101;

Step 2: data stream is divided the data flow data in module 103 read step 1, and according to the sequencing that data in data stream arrive, data stream is divided; Described data stream is divided module 103 and is divided in the data block that obtains, and comprises N bar record; N is fixed variable, is set in advance by the user;

Step 3: usage data stream is divided obtained the static data piece after module 103 is divided, be input in kdq tree module 104 and build the kdq tree; Wherein, the threshold value that described kdq tree is corresponding is used the bootstrap based on the KL divergence to calculate to provide or is directly given by the user;

Step 4: kdq is set kdq tree, threshold value corresponding to kdq tree that module 104 sets up put into concept pond 106 and preserve;

Step 5: concept detection module 105 is divided module 103 in data stream and is obtained a new data block, whether and to detect new data block be new concept, and the testing result of concept detection module 105 provides according to the comparative result of being set corresponding threshold value by the kdq that preserves in the KL divergence value of original data block, new data block and concept pond 106; Need to carry out discretize to original data block when calculating K L divergence, the result of discretize is provided by the result that kdq sets by data block;

Step 6: when data stream is divided module 103 when obtaining new data block, the concept of preserving in this data block and concept pond 106 is compared, if find similar concept, Concept counting module 107 is upgraded; Otherwise this data block is added in concept pond 106 as new concept;

Step 7: repeating step 1-6 is until the data stream end.Gather the statistical information in Concept counting module 107 this moment, calculates the statistical information of each concept in concept pond 106;

Step 8: above-mentioned statistical information is input to concept drift module 108, utilizes Bayesian formula structure concept transition diagram, complete the concept drift visualization process.

Wherein, set up the kdq tree in described step 3 and comprise following substep:

Step 3.1: at first in the selected data piece first attribute as current attribute, seek meta numerical value v in current dimension, can divide data block, make the sample size in two subsets that obtain after division substantially equal, namely be less than or equal to the quantity of data of v greater than the numerical value of the data of v and current attribute substantially equal for the numerical value of current attribute;

Step 3.2: in above-mentioned resulting subset, seek in follow-up attribute and can satisfy the attribute of division condition, and select this attribute as current attribute, repeat to seek the process of meta numerical value, continue the dividing data subset;

Step 3.3: repeat above process, until satisfy end condition;

Described division condition is: the difference of the maximal value of leading dimension degrees of data and little value is greater than variable ε, and the ε value is specified by the user;

Described end condition is: the data scale of current data block is less than n _min, perhaps the difference of the maximin of each dimension is all less than ε, wherein in n _minValue is specified in advance by the user.

Wherein, in described step 4, kdq tree module 104 adopts dependent thresholds corresponding to the given kdq tree of bootstrap, comprises the following steps:

Step 4.1: the extraction N bar data of putting back to are arranged from original data block, and the data that namely extract are not deleted from original data block, utilize the data that extract to consist of new data block;

Step 4.2: utilize the kdq tree that new data block is divided and obtain the discretize result;

Step 4.3: calculate the KL divergence value of new data block and original data block according to the computing formula of described KL divergence, result of calculation is added in formation;

The computing formula of described KL divergence is:

(1)

(2)

In formula (1), kl ₁The expression data block C _aAnd data block C _bBetween the KL divergence that distributes of data, Pc _a(x) expression data block C _aProbability distribution after discretize, Pc _b(x) expression data block C _bProbability distribution after discretize; w _b,jThe expression data block C _bAfter discretize at j interval data scale, w _a,jThe expression data block C _aAfter discretize at j interval data scale, N _bThe expression data block C _bThe total data scale; TExpression can obtain after to the data block discretize altogether TIndividual interval; The discretize result of described data block is obtained by the kdq tree;

In formula (2), kl ₂The expression data block C _aAnd data block C _bBetween the KL divergence that distributes of the data of tape label, Pc _a(Y|x) expression data block C _aThe probability distribution of every kind of label after discretize, Pc _b(Y|x) expression data block C _bThe probability distribution of the label after discretize; w _{B, i, j}The expression data block C _bAfter discretize in j interval label be the data scale of i, w _{A, i, j}The expression data block C _aAfter discretize in j interval label be the data scale of i, N _bThe expression data block C _bThe total data scale; TExpression can obtain after to the data block discretize altogether TIndividual interval, | Y| represents that data are total | the label that the Y| kind is different; The discretize result of described data block is obtained by the kdq tree;

Step 4.4: repeating step 4.1 repeats k time altogether to step 4.3; Wherein, the k value is the constant that the user sets in advance;

Step 4.5: the data value in formation is sorted according to size, get the large value of 1-α as threshold value.

Wherein, described α represents to occur the degree of confidence of concept drift, and wherein α, is specified by the user less than 1 in advance greater than 0.

Wherein, described step 4.2 is utilized the kdq tree that new data block is divided and is obtained the discretize result, comprises the following steps:

Step 4.2.1: data in new data block are divided according to the decision value that kdq sets each node, if the value of data specified dimension enters the left subtree of root node less than or equal to the decision value of node in kdq tree, otherwise enter the right subtree of root node;

Step 4.2.2: repeat described decision process until stop when in data block, all data all are divided into the leaf node of kdq tree, so far obtain the division result for this data block;

Step 4.2.3: for the kdq tree division result for current data block, calculate sample size in each zoning divided by the whole sample size of data block, obtain current data block for the probability distribution of the discretize result of kdq tree

And

Wherein, concept detection module 105 detection concept drifts in described step 5 comprise following substep:

Step 5.1: data block by the result that kdq sets, is obtained the kdq tree to the discretize result of data block;

Step 5.2: utilize KL divergence computing formula, calculate the KL divergence value of each concept kdq discrete results of preservation in new data block kdq tree discrete results and concept pond module 106.

Step 5.3: the threshold value that above-mentioned KL divergence value is corresponding with the kdq tree compares, if the KL divergence value that calculates less than threshold value corresponding to kdq tree, represents that concept drift does not occur, otherwise the actual generation of expression concept drift.

Wherein, described step 6 comprises following substep:

Step 6.1: with the kdq of new data block tree discrete value respectively with concept pond module 106 in the concept of preserving, calculating K L divergence value.

Step 6.2: the KL divergence value that obtains is sorted, if the threshold value that minimum KL divergence value still sets greater than the concept detection module, represent in concept pond module 106, the corresponding concept of this data block not, be that this data block is expressed as a kind of new ideas, so the kdq of this data block tree discretize result is stored in concept pond module 106, occurs as new concept.If find the concept that satisfies threshold value, the relevant statistics to this concept upgrades in Concept counting module 107.

Wherein, described step 8 comprises following substep:

Step 8.1: with in Concept counting module 107, in the module 106 of concept pond, each Concept counting information gathers for being stored in;

Step 8.2: utilize Bayesian formula to calculate mutual transition probability between different concepts;

Wherein P( C _i) probability of i concept of expression appearance, P( C _j) probability of j concept of expression appearance, P( C _j| C _i) represent when i concept occurs, the probability of j concept to occur.

Step 8.3: will calculate the mutual transition probability of each concept and add up, and be input to data stream concept drift module 108, and draw out data stream concept drift figure, and complete visualization process.

Wherein, described data stream 101 comprises: network intrusion monitoring, network security monitoring, sensing data monitoring and mains supply various aspects data.

The invention has the beneficial effects as follows: the present invention utilizes kdq tree and the drift of KL divergence detection concept, can detect dissimilar concept drift, and dissimilar concept is stored in the concept pond; Compare by the concept of will preserve in new data block and concept pond, can statistics stream in the number of times that occurs of concept and the transfer relationship between different concepts; And finally construct concept drift figure, complete data stream concept drift visualization tasks.

Description of drawings

Fig. 1 is the FB(flow block) of the concept drift method for visualizing under a kind of dynamic data environment of the present invention.

Fig. 2 is that the present invention utilizes data stream to divide a kind of specific embodiment that module is divided data stream.

Fig. 3 is the process flow diagram of kdq tree module of the present invention in setting up kdq tree process.

Fig. 4 is the concrete implementing procedure that bootstrap of the present invention is asked for threshold value corresponding to kdq tree.

Fig. 5 is that in concept of the present invention pond, the concept detection module is carried out a kind of specific embodiment of concept detection to data block.

Fig. 6 is the concept drift transition diagram under a kind of dynamic data environment of the present invention.

Accompanying drawing sign: the 101-data stream, 102-data stream collection module, the 103-data stream is divided module, and 104-kdq sets module, 105-concept detection module, 106-concept pond module, 107-Concept counting module, 108-data stream concept drift module.

Embodiment

The present invention is described in detail below in conjunction with drawings and Examples.

With reference to Fig. 1, the framework of the method for classifying data stream under a kind of dynamic data environment of the present invention, comprise data stream 101, data stream collection module 102, data stream is divided module 103, and kdq sets module 104, concept detection module 105, concept pond module 106, Concept counting module 107, data stream concept drift module 108; Comprise the following steps:

Step 1: data stream collection module 102 is collected data according to time sequencing from data stream 101.Data stream 101 comprises the data stream to any type known to persons of ordinary skill in the art, particularly including the network invasion monitoring data stream, and network security monitoring data stream, sensing data monitor data stream and mains supply data stream.Data stream normally produces in real time, therefore all will become very difficult to calculating, the storage of data flow data.

Step 2: data stream is divided module 103 reading out data from data stream collection module 102, and data stream is divided successively according to time sequencing according to the data block capacity of setting in advance.Described data stream is divided module (103) and is divided in the data block that obtains, and comprises N bar record; N is fixed variable, is set in advance by the user.Kdq sets module 104, and the division result that the required current data blocks of concept detection module 105 and concept pond module 108 are divided module 103 by data stream provides.

Step 3: usage data stream is divided mould (obtain the static data piece after 103 divisions, be input in kdq tree module 104 and build the kdq tree; Wherein, the threshold value that described kdq tree is corresponding is used the bootstrap based on the KL divergence to calculate to provide or is directly given by the user;

Step 4: kdq is set kdq tree, threshold value corresponding to kdq tree that module (104) sets up put into concept pond module (106) preservation;

Step 5: concept detection module (105) is divided module (103) in data stream and is obtained a new data block, and detecting whether new data block is new concept, the testing result of concept detection module (105) provides with the comparative result of the middle corresponding threshold value of kdq tree of preserving of concept pond module (106) according to the KL divergence value by original data block, new data block; Need to carry out discretize to original data block when calculating K L divergence, the result of discretize is provided by the result that kdq sets by data block;

Step 6: concept detection module 105 is used KL divergence methods, the concept of preserving in the kdq tree representation form of new data block and concept pond 106 is compared the searching similar concept.If find similar concept, the corresponding ASSOCIATE STATISTICS information of this concept in innovation idea statistical module 107, otherwise the expression new data block is new concept is stored in the corresponding kdq tree construction of new data block and kdq threshold value in the concept pond.

Step 7: repeating step 1-6, gathers the information in the Concept counting module when data stream 101 is disposed fully until data stream 101 finishes, divides according to different concepts, and the transition probability between the calculating different concepts.

Step 8: with above-mentioned input information concept drift transition diagram module 108, utilize Bayesian formula to draw the concept drift Visual Graph, complete the concept drift visualization process.

Described data stream collection module 102, data stream are divided module 103, kdq sets module 104,105 of concept detection moulds, and concept pond module 106, Concept counting module 107, data stream concept drift module 108 all is stored in the storer of computer system.

With reference to Fig. 2, for utilizing data stream to divide 103 pairs of data streams of module, Fig. 1 carries out a kind of concrete enforcement of piecemeal, data stream is divided module 103 and according to the sequencing that data stream in data stream collection module 102 arrives, data stream has been carried out piecemeal, be divided in order the first data block, the second data block ... the m data block, each data block comprises N bar data, and the N value can dynamically be adjusted by data stream division module 103 according to the feature of data stream.

Fig. 3 is the process that in step 3, kdq tree module 104 builds the kdq tree, and is as described below:

Step 3.1: first dimension in the selected data piece is as current dimension, seek meta numerical value v in current dimension, the searching of meta numerical value, requirement is two subsets of sample in data block according to scales such as quantity are divided into, and the numerical value of current dimension is substantially equal greater than, the data bulk that is less than or equal to v greater than the numerical value of the data of v and current dimension;

Step 3.2: seek the attribute that satisfies the division condition in the follow-up dimension of above-mentioned resulting subset, and use the dimension that satisfies condition as current dimension, repeat to seek the process of meta numerical value, continue the dividing data subset,

Step 3.3: repeating step 3.1 and step 3.2, until satisfy end condition;

Current dimension satisfies the division condition: the difference of the little value of the maximal value of leading dimension degrees of data is greater than ε, and the ε value is specified by the user;

Described end condition is: the data scale of current data block is less than n _min, perhaps the difference of the maximin of each dimension is all less than ε.When satisfying stop condition, obtain a kind of division to original data block.

If the result that will divide as the left and right result of decision of current property value node, can obtain a kind of tree structure to the data division at every turn, namely kdq sets.The characteristics of kdq tree are can be approximate impartial to the division result (discretize result) of original data block.When the data stream environment is more stable, when concept drift not occuring, also can obtain approximately equalised discretize result to current data block.

With reference to Fig. 4, in step 4, kdq tree module 104 adopts dependent thresholds corresponding to the given kdq tree of bootstrap, comprises the following steps:

Step 4.1: the extraction N bar data of putting back to are arranged from original data block, and the data that namely extract are not deleted from original data block, and each process that extracts is completely random, and is separate between sample drawn.Utilize the data that extract to consist of new data block;

Step 4.2: repeat described extraction process until the data scale in new data block reaches m, and altogether obtain k new data block, utilize the kdq tree that new data block is divided and obtain the discretize result;

Step 4.3: calculate the KL divergence value of new data block and original data block according to the computing formula of described KL divergence, result of calculation is added in formation sort;

Use the KL divergence to carry out the judgement of similarity between data block in specific embodiment, the computing formula of described KL divergence is:

(1)

(2)

In formula (1), kl ₁The expression data block C _aAnd data block C _bBetween the KL divergence that distributes of data, Pc _a(x) expression data block C _aProbability distribution after discretize, Pc _b(x) expression data block C _bProbability distribution after discretize; w _b,jThe expression data block C _bAfter discretize at j interval data scale, w _a,jThe expression data block C _aAfter discretize at j interval data scale, N _bThe expression data block C _bThe total data scale; TExpression can obtain after to the data block discretize altogether TIndividual interval; The discretize result of described data block is obtained by the kdq tree.

Step 4.4: repeating step 4.1 repeats k time altogether to step 4.3; Wherein, the k value is the constant that the user sets in advance.

Step 4.5: the data value in formation is sorted according to size, get the large value of 1-α as threshold value, (α represents to occur the degree of confidence of concept drift, and wherein α, is specified by the user less than 1 in advance greater than 0.) obtaining the corresponding threshold value of kdq tree that degree of confidence is α, i.e. expression is if the KL divergence value that have a new data block and original data block this moment represents that greater than described threshold value described new data block has the probability of 1-α that concept drift has occured.

When new data block arrives, utilizing 105 pairs of new data blocks of concept detection module to carry out concept drift detects, the process that detects is as described below: current data block is obtained current data block for the discretize result of current kdq tree by current kdq tree, utilize KL divergence computing formula obtain the KL divergence value of current data block and original data block and compare with threshold value corresponding to kdq tree, if the KL divergence value that calculates represents that less than threshold value corresponding to kdq tree concept drift does not occur, otherwise represents that new concept has produced.

In step 4.2, utilize the kdq tree that new data block is divided and obtain the discretize result, comprise the following steps:

And

With reference to Fig. 5, described concept drift detects the overall process applicating example.At first data stream 101 usage datas are divided module 103 and divide, obtained for example A of different pieces of information piece, B etc., this moment, we did not know the concept of data block representative.To divide in this data block input kdq tree module 104, in the data block after then dividing and concept pond module 106, the concept of preservation compares.In process relatively, use KL divergence method, and be to choose the similarity threshold value by bootstrap.And the data block of each new data block and concept pond module 106 preservations compares respectively, judges finally whether this new data block is new ideas.

Fig. 5 will represent that in the data block of A concept and concept pond 106, concept compares.Finding has had the A concept in the concept pond, in innovation idea statistical module 107 about the ASSOCIATE STATISTICS information of A concept.If do not find similar concept, for example data block comprises concept E, but related notion not in concept pond 106 joins concept E in concept pond 106, and in Concept counting module 107, concept E added, for follow-up statistics is prepared.

Carry out repetition for said process, the number of times of repetition is relevant with data stream length, until data stream is whole processed complete, perhaps reaches the end parameter of setting in advance.

With reference to Fig. 6, the 108 final visual figure of concept drift that export give an example for the present invention's profit concept drift module.Wherein circle represents different concepts, and uses C1, the signs such as C2.Limit between each concept represents the relation between the different concepts transfer, and the weights on the limit represent the transition probability between any two conceptions of species.Concept drift comprises dual mode, certainly shifts and outer transfer, and be wherein the same concept from shifting between the expression adjacent data blocks, namely concept does not change, and is the form that oneself turns to oneself.The outer transfer represents, new concept occurs, and therefore shifted to new ideas by old concept.In example, we use 5000 samples as visual condition, therefore visual figure of concept drift of every 5000 samples output.Two circles represent to work as preconception in each subgraph, nearest concept when namely ending.

Be more than the preferred embodiment of the present invention, protection scope of the present invention also not only is confined to above-described embodiment, and all technical schemes that belongs under thinking of the present invention all belong to protection scope of the present invention.Should be pointed out that for those skilled in the art, some improvements and modifications under the premise of not departing from the present invention should be considered as protection scope of the present invention.

Claims

1. the data stream concept drift method for visualizing under a dynamic data environment specifically comprises the following steps:

Step 1: dynamic dataflow collection module (102) is collected data according to time sequencing from magnanimity real-time stream (101);

Step 2: data stream is divided the data flow data in module (103) read step 1, and according to the sequencing that data in data stream arrive, data stream is divided; Described data stream is divided module (103) and is divided in the data block that obtains, and comprises N bar record; N is fixed variable, is set in advance by the user;

Step 3: usage data stream is divided obtained the static data piece after module (103) is divided, be input to and build the kdq tree in kdq tree module (104); Wherein, the threshold value that described kdq tree is corresponding is used the bootstrap based on the KL divergence to calculate to provide or is directly given by the user;

Step 4: kdq is set kdq tree, threshold value corresponding to kdq tree that module (104) sets up put into concept pond (106) preservation;

Step 5: concept detection module (105) is divided module (103) in data stream and is obtained a new data block, and detecting whether new data block is new concept, the testing result of concept detection module (105) provides with the comparative result of the middle corresponding threshold value of kdq tree of preserving in concept pond (106) according to the KL divergence value by original data block, new data block; Need to carry out discretize to original data block when calculating K L divergence, the result of discretize is provided by the result that kdq sets by data block;

Step 6: when data stream division module (103) is obtained new data block, the concept of preserving in this data block and concept pond (106) is compared, if find similar concept, Concept counting module (107) is upgraded; Otherwise this data block is added in concept pond (106) as new concept;

Step 7: repeating step 1-6 is until the data stream end; Gather the statistical information in Concept counting module (107) this moment, calculates the statistical information of each concept in concept pond (106);

Step 8: above-mentioned statistical information is input to concept drift module (108), utilizes Bayesian formula structure concept transition diagram, complete the concept drift visualization process.

2. the method for classifying data stream under a kind of dynamic data environment according to claim 1, is characterized in that, sets up the kdq tree in described step 3 and comprise following substep:

Step 3.1: at first in the selected data piece first attribute as current attribute, seek meta numerical value v in current dimension, data block is divided, make the sample size in two subsets that obtain after division substantially equal, namely be less than or equal to the quantity of data of v greater than the numerical value of the data of v and current attribute substantially equal for the numerical value of current attribute;

Step 3.3: repeat above process, until satisfy end condition;

3. the data stream concept drift method for visualizing under a kind of dynamic environment according to claim 1, is characterized in that, in described step 4, kdq tree module (104) adopts dependent thresholds corresponding to the given kdq tree of bootstrap, comprises the following steps:

The computing formula of described KL divergence is:

(1)

(2)

Step 4.5: the data value in formation is sorted according to size, get the large value of 1-α as threshold value;

4. the data stream concept drift method for visualizing under a kind of dynamic environment according to claim 3, is characterized in that, described step 4.2 is utilized the kdq tree that new data block is divided and obtained the discretize result, comprises the following steps:

And

5. the data stream concept drift method for visualizing under a kind of dynamic environment according to claim 1, is characterized in that, concept detection module (105) detection concept drift in described step 5 comprises following substep:

Step 5.2: utilize KL divergence computing formula, calculate the KL divergence value of each concept kdq discrete results of preserving in new data block kdq tree discrete results and concept pond module (106);

6. the data stream concept drift method for visualizing under a kind of dynamic environment according to claim 1, is characterized in that, described step 6 comprises following substep:

Step 6.1: with the kdq of new data block tree discrete value respectively with concept pond module (106) in the concept of preserving, calculating K L divergence value;

Step 6.2: the KL divergence value that obtains is sorted, if the threshold value that minimum KL divergence value still sets greater than the concept detection module, represent in concept pond module (106), the corresponding concept of this data block not, be that this data block is expressed as a kind of new ideas, kdq tree discretize result with this data block is stored in concept pond module (106) so, occurs as new concept; If find the concept that satisfies threshold value, the relevant statistics to this concept upgrades in Concept counting module (107).

7. the method for classifying data stream under a kind of dynamic data environment according to claim 1, is characterized in that, described step 8 comprises following substep:

Step 8.1: with in Concept counting module (107), in concept pond module (106), each Concept counting information gathers for being stored in;

Wherein P( C _i) probability of i concept of expression appearance, P( C _j) probability of j concept of expression appearance, P( C _j| C _i) represent when i concept occurs, the probability of j concept to occur;

Step 8.3: will calculate the mutual transition probability of each concept and add up, and be input to data stream concept drift module (108), and draw out data stream concept drift figure, and complete visualization process.

8. the method for classifying data stream under a kind of dynamic data environment according to claim 1, is characterized in that, described data stream (101) comprising: network intrusion monitoring, network security monitoring, sensing data monitoring and mains supply various aspects data.