CN105631475A

CN105631475A - Computer data mining and clustering method based on time sequence

Info

Publication number: CN105631475A
Application number: CN201510992669.8A
Authority: CN
Inventors: 李洁; 孙燕; 石成富
Original assignee: Individual
Current assignee: Individual
Priority date: 2015-12-25
Filing date: 2015-12-25
Publication date: 2016-06-01

Abstract

The invention discloses a computer data mining and clustering method based on a time sequence. According to the computer data mining and clustering method, denoising and normalization processing is performed on input sample sets X and Y, and extreme point solving is performed on the processed time sequence so that extreme point sequences X' and Y' are obtained; then isometric processing is performed on X' and Y', and classification sequences X" and Y" with equal length are obtained after isometric processing; then class distance calculation is performed on the processed sequences X" and Y", the two classes of the maximum distance are combined and one class is reduced after combination; then class distance calculation is cyclically performed on the processed classification sequences X" and Y" and two classes of the maximum distance are combined until the number of clusters is equal to preset data and then clustering ends; and finally a clustering result is outputted. According to the method, the time sequence data of high data volume and high dimension can be effectively processed, the method is easy and practicable without depending on concrete sequences, data mining and clustering can be effectively performed, and mass data can be effectively compressed and the main characteristics of the data can be maintained.

Description

A kind of computer data based on time series excavates clustering method

Technical field

The present invention relates to the field of computer data digging technology, especially relate to a kind of computer data based on time series and excavate clustering method.

Background technology

Along with the development of social informatization, the continuous expansion in information technology application field, each Application Areas comprises economy, medical treatment, building, environment etc. and all have accumulated more and more data. From the 80's of last century, data total amount all over the world rapidly increases, and within even several months, will double, but how effectively to utilize, analyze these data information, and therefrom obtains the useful information that it is hidden, then become a huge challenge. In the data of these magnanimity, some data be temporally order ordered arrangement, this kind of data are just referred to as time series (TimeSeries). Equal lifetime sequence in each Application Areas, by furtheing investigate these time serieses, it has been found that the potential rule that sequence is hidden behind and valuable information have great social effect and economic worth.

In recent years, along with the increase of data volume, some data analysing methods cannot effectively extract more how valuable data information, and therefore a kind of new data analysing method data mining (DataMining) technology just creates. Data mining technology can not only analyze existing data, also can predict following unknown information from original data, such as, the sales volume etc. in market next month can be predicted by data mining. What is data mining? data mining can be defined in many different forms, in simple terms, data mining is exactly extract valuable information from the data information of magnanimity, and original data major part is the data having fuzzy noise, but there is again a lot of potential value in these data. The process excavated is by utilizing the technical knowledge of every field that mass data is carried out Treatment Analysis, excavates and can be of value to the content that people carry out higher level analysis decision.

At present, although the research of data mining having been obtained many achievements both at home and abroad, but the excavation of the time series of each Application Areas is not had versatility, such as the impact of performance method of the data mining of financial field obtained when medical field is applied is not fine. The method of great majority just may show comparatively good performance in a certain now, and can not comprehensively get up to have a good performance in other all respects. Obviously, the research of time series still also existing some shortcomings in the past, the time series for different field excavates problem, traditional method for digging oneself be not suitable for, the techniques and methods that some are new need to be sought.

Summary of the invention

It is an object of the invention to overcome in prior art the above-mentioned defect existed, a kind of computer data based on time series is provided to excavate clustering method, can effectively process that data volume is big, the time series data of Wei Dugao, method is simple, do not rely on concrete sequence, data mining cluster can be carried out efficiently, vector data is effectively compressed and retains the main feature of its data.

In order to realize above-mentioned purpose, the present invention provides a kind of computer data method for digging based on time series, and the method comprises the steps:

Step 1: input to and determine sample set X, Y, wherein X={x₁,x₂,��,x_n, Y={y₁,y₂,��,y_n;

Step 2: input amendment collection is carried out denoising, normalized;

Step 3: time series X, Y are carried out extreme point and asks for, obtain extreme point sequence X ', Y';

Step 4: to the region extreme point sequence X obtained ', Y' the long process such as carry out, the sorting sequence X etc. obtaining length after long process and be k ", Y ";

Step 5: to process after sorting sequence X ", Y " carry out class distance calculate, wherein, class distance d (X_i) represent be:

d(X_i)=min | X_i-Y_j|;

Wherein, X_iSorting sequence X " in any one number, Y_jSorting sequence Y " in any one number;

Step 6: class is merged apart from two maximum classes, after merging, classification number reduces one;

Step 7: return step 5,6, continues circulation and performs, until cluster numbers equals default numerical value, can cluster terminate;

Step 8: export cluster result.

Compared with prior art, the main advantage of the present invention is:

The present invention provides a kind of computer data based on time series and excavates clustering method, this computer data excavates clustering method by input amendment collection X, Y are carried out denoising, normalized, and the time series after process is carried out extreme point is asked for, obtain extreme point sequence X ', Y'; Again to long process such as X', Y' carry out, etc. obtaining the equal sorting sequence X of length after long process ", Y "; Then to process after sequence X ", Y " carry out class distance calculate, merge apart from two maximum classes, merge after classification number reduce one; Then to the sorting sequence X after process ", Y " circulation performs class distance and calculates and maximum two classes of combined distance, until cluster numbers equals default data, can cluster terminate; Finally export cluster result. The method can process that data volume is big, the time series data of Wei Dugao effectively, and method is simple, does not rely on concrete sequence, can carry out data mining cluster efficiently, mass data is effectively compressed and retains the main feature of its data.

Accompanying drawing explanation

Fig. 1 be the present invention realize functional block diagram.

Embodiment

Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described in detail, so that the technician of this area understands the present invention better.

As shown in Figure 1, being the embodiment that a kind of computer data based on time series of the present invention excavates clustering method, its concrete implementation step is:

Step 2: input amendment collection is carried out denoising, normalized;

d(X_i)=min | X_i-Y_j|;

Step 8: export cluster result.

This computer data excavates clustering method by input amendment collection X, Y are carried out denoising, normalized, and the time series after process is carried out extreme point and asked for, obtain extreme point sequence X ', Y'; Again to long process such as X', Y' carry out, etc. obtaining the equal sorting sequence X of length after long process ", Y "; Then to process after sequence X ", Y " carry out class distance calculate, merge apart from two maximum classes, merge after classification number reduce one; Then to the sorting sequence X after process ", Y " circulation performs class distance and calculates and maximum two classes of combined distance, until cluster numbers equals default data, can cluster terminate; Finally export cluster result. The method can process that data volume is big, the time series data of Wei Dugao effectively, and method is simple, does not rely on concrete sequence, can carry out data mining cluster efficiently, mass data is effectively compressed and retains the main feature of its data.

Mode of more than implementing is only the technological thought that the present invention is described, can not limit protection scope of the present invention with this, and every technological thought proposed according to the present invention, any change done on technical scheme basis, all falls within protection domain of the present invention.

Claims

1. the computer data based on time series excavates clustering method, it is characterised in that, the method comprises the following steps:

Step 1: input to and determine sample set X, Y, wherein X={x₁,x₂,...,x_n, Y={y₁,y₂,...,y_n;

Step 2: input amendment collection is carried out denoising, normalized;

d(X_i)=min | X_i-Y_j|;

Step 8: export cluster result.