CN105631475A - Computer data mining and clustering method based on time sequence - Google Patents

Computer data mining and clustering method based on time sequence Download PDF

Info

Publication number
CN105631475A
CN105631475A CN201510992669.8A CN201510992669A CN105631475A CN 105631475 A CN105631475 A CN 105631475A CN 201510992669 A CN201510992669 A CN 201510992669A CN 105631475 A CN105631475 A CN 105631475A
Authority
CN
China
Prior art keywords
data
sequence
extreme point
clustering
sequences
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510992669.8A
Other languages
Chinese (zh)
Inventor
李洁
孙燕
石成富
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201510992669.8A priority Critical patent/CN105631475A/en
Publication of CN105631475A publication Critical patent/CN105631475A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a computer data mining and clustering method based on a time sequence. According to the computer data mining and clustering method, denoising and normalization processing is performed on input sample sets X and Y, and extreme point solving is performed on the processed time sequence so that extreme point sequences X' and Y' are obtained; then isometric processing is performed on X' and Y', and classification sequences X" and Y" with equal length are obtained after isometric processing; then class distance calculation is performed on the processed sequences X" and Y", the two classes of the maximum distance are combined and one class is reduced after combination; then class distance calculation is cyclically performed on the processed classification sequences X" and Y" and two classes of the maximum distance are combined until the number of clusters is equal to preset data and then clustering ends; and finally a clustering result is outputted. According to the method, the time sequence data of high data volume and high dimension can be effectively processed, the method is easy and practicable without depending on concrete sequences, data mining and clustering can be effectively performed, and mass data can be effectively compressed and the main characteristics of the data can be maintained.

Description

A kind of computer data based on time series excavates clustering method
Technical field
The present invention relates to the field of computer data digging technology, especially relate to a kind of computer data based on time series and excavate clustering method.
Background technology
Along with the development of social informatization, the continuous expansion in information technology application field, each Application Areas comprises economy, medical treatment, building, environment etc. and all have accumulated more and more data. From the 80's of last century, data total amount all over the world rapidly increases, and within even several months, will double, but how effectively to utilize, analyze these data information, and therefrom obtains the useful information that it is hidden, then become a huge challenge. In the data of these magnanimity, some data be temporally order ordered arrangement, this kind of data are just referred to as time series (TimeSeries). Equal lifetime sequence in each Application Areas, by furtheing investigate these time serieses, it has been found that the potential rule that sequence is hidden behind and valuable information have great social effect and economic worth.
In recent years, along with the increase of data volume, some data analysing methods cannot effectively extract more how valuable data information, and therefore a kind of new data analysing method data mining (DataMining) technology just creates. Data mining technology can not only analyze existing data, also can predict following unknown information from original data, such as, the sales volume etc. in market next month can be predicted by data mining. What is data mining? data mining can be defined in many different forms, in simple terms, data mining is exactly extract valuable information from the data information of magnanimity, and original data major part is the data having fuzzy noise, but there is again a lot of potential value in these data. The process excavated is by utilizing the technical knowledge of every field that mass data is carried out Treatment Analysis, excavates and can be of value to the content that people carry out higher level analysis decision.
At present, although the research of data mining having been obtained many achievements both at home and abroad, but the excavation of the time series of each Application Areas is not had versatility, such as the impact of performance method of the data mining of financial field obtained when medical field is applied is not fine. The method of great majority just may show comparatively good performance in a certain now, and can not comprehensively get up to have a good performance in other all respects. Obviously, the research of time series still also existing some shortcomings in the past, the time series for different field excavates problem, traditional method for digging oneself be not suitable for, the techniques and methods that some are new need to be sought.
Summary of the invention
It is an object of the invention to overcome in prior art the above-mentioned defect existed, a kind of computer data based on time series is provided to excavate clustering method, can effectively process that data volume is big, the time series data of Wei Dugao, method is simple, do not rely on concrete sequence, data mining cluster can be carried out efficiently, vector data is effectively compressed and retains the main feature of its data.
In order to realize above-mentioned purpose, the present invention provides a kind of computer data method for digging based on time series, and the method comprises the steps:
Step 1: input to and determine sample set X, Y, wherein X={x1,x2,��,xn, Y={y1,y2,��,yn;
Step 2: input amendment collection is carried out denoising, normalized;
Step 3: time series X, Y are carried out extreme point and asks for, obtain extreme point sequence X ', Y';
Step 4: to the region extreme point sequence X obtained ', Y' the long process such as carry out, the sorting sequence X etc. obtaining length after long process and be k ", Y ";
Step 5: to process after sorting sequence X ", Y " carry out class distance calculate, wherein, class distance d (Xi) represent be:
d(Xi)=min | Xi-Yj|;
Wherein, XiSorting sequence X " in any one number, YjSorting sequence Y " in any one number;
Step 6: class is merged apart from two maximum classes, after merging, classification number reduces one;
Step 7: return step 5,6, continues circulation and performs, until cluster numbers equals default numerical value, can cluster terminate;
Step 8: export cluster result.
Compared with prior art, the main advantage of the present invention is:
The present invention provides a kind of computer data based on time series and excavates clustering method, this computer data excavates clustering method by input amendment collection X, Y are carried out denoising, normalized, and the time series after process is carried out extreme point is asked for, obtain extreme point sequence X ', Y'; Again to long process such as X', Y' carry out, etc. obtaining the equal sorting sequence X of length after long process ", Y "; Then to process after sequence X ", Y " carry out class distance calculate, merge apart from two maximum classes, merge after classification number reduce one; Then to the sorting sequence X after process ", Y " circulation performs class distance and calculates and maximum two classes of combined distance, until cluster numbers equals default data, can cluster terminate; Finally export cluster result. The method can process that data volume is big, the time series data of Wei Dugao effectively, and method is simple, does not rely on concrete sequence, can carry out data mining cluster efficiently, mass data is effectively compressed and retains the main feature of its data.
Accompanying drawing explanation
Fig. 1 be the present invention realize functional block diagram.
Embodiment
Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described in detail, so that the technician of this area understands the present invention better.
As shown in Figure 1, being the embodiment that a kind of computer data based on time series of the present invention excavates clustering method, its concrete implementation step is:
Step 1: input to and determine sample set X, Y, wherein X={x1,x2,��,xn, Y={y1,y2,��,yn;
Step 2: input amendment collection is carried out denoising, normalized;
Step 3: time series X, Y are carried out extreme point and asks for, obtain extreme point sequence X ', Y';
Step 4: to the region extreme point sequence X obtained ', Y' the long process such as carry out, the sorting sequence X etc. obtaining length after long process and be k ", Y ";
Step 5: to process after sorting sequence X ", Y " carry out class distance calculate, wherein, class distance d (Xi) represent be:
d(Xi)=min | Xi-Yj|;
Wherein, XiSorting sequence X " in any one number, YjSorting sequence Y " in any one number;
Step 6: class is merged apart from two maximum classes, after merging, classification number reduces one;
Step 7: return step 5,6, continues circulation and performs, until cluster numbers equals default numerical value, can cluster terminate;
Step 8: export cluster result.
This computer data excavates clustering method by input amendment collection X, Y are carried out denoising, normalized, and the time series after process is carried out extreme point and asked for, obtain extreme point sequence X ', Y'; Again to long process such as X', Y' carry out, etc. obtaining the equal sorting sequence X of length after long process ", Y "; Then to process after sequence X ", Y " carry out class distance calculate, merge apart from two maximum classes, merge after classification number reduce one; Then to the sorting sequence X after process ", Y " circulation performs class distance and calculates and maximum two classes of combined distance, until cluster numbers equals default data, can cluster terminate; Finally export cluster result. The method can process that data volume is big, the time series data of Wei Dugao effectively, and method is simple, does not rely on concrete sequence, can carry out data mining cluster efficiently, mass data is effectively compressed and retains the main feature of its data.
Mode of more than implementing is only the technological thought that the present invention is described, can not limit protection scope of the present invention with this, and every technological thought proposed according to the present invention, any change done on technical scheme basis, all falls within protection domain of the present invention.

Claims (1)

1. the computer data based on time series excavates clustering method, it is characterised in that, the method comprises the following steps:
Step 1: input to and determine sample set X, Y, wherein X={x1,x2,...,xn, Y={y1,y2,...,yn;
Step 2: input amendment collection is carried out denoising, normalized;
Step 3: time series X, Y are carried out extreme point and asks for, obtain extreme point sequence X ', Y';
Step 4: to the region extreme point sequence X obtained ', Y' the long process such as carry out, the sorting sequence X etc. obtaining length after long process and be k ", Y ";
Step 5: to process after sorting sequence X ", Y " carry out class distance calculate, wherein, class distance d (Xi) represent be:
d(Xi)=min | Xi-Yj|;
Wherein, XiSorting sequence X " in any one number, YjSorting sequence Y " in any one number;
Step 6: class is merged apart from two maximum classes, after merging, classification number reduces one;
Step 7: return step 5,6, continues circulation and performs, until cluster numbers equals default numerical value, can cluster terminate;
Step 8: export cluster result.
CN201510992669.8A 2015-12-25 2015-12-25 Computer data mining and clustering method based on time sequence Pending CN105631475A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510992669.8A CN105631475A (en) 2015-12-25 2015-12-25 Computer data mining and clustering method based on time sequence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510992669.8A CN105631475A (en) 2015-12-25 2015-12-25 Computer data mining and clustering method based on time sequence

Publications (1)

Publication Number Publication Date
CN105631475A true CN105631475A (en) 2016-06-01

Family

ID=56046387

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510992669.8A Pending CN105631475A (en) 2015-12-25 2015-12-25 Computer data mining and clustering method based on time sequence

Country Status (1)

Country Link
CN (1) CN105631475A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108181547A (en) * 2017-12-20 2018-06-19 珠海许继电气有限公司 A kind of dynamic time warping distance fault section location method based on Time Series Compression
CN109918581A (en) * 2019-03-06 2019-06-21 上海评驾科技有限公司 A kind of more results of the more points of interest of user based on space-time data know method for distinguishing
CN111125198A (en) * 2019-12-27 2020-05-08 南京航空航天大学 Computer data mining clustering method based on time sequence

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108181547A (en) * 2017-12-20 2018-06-19 珠海许继电气有限公司 A kind of dynamic time warping distance fault section location method based on Time Series Compression
CN108181547B (en) * 2017-12-20 2020-05-12 珠海许继电气有限公司 Dynamic time bending distance fault section positioning method based on time sequence compression
CN109918581A (en) * 2019-03-06 2019-06-21 上海评驾科技有限公司 A kind of more results of the more points of interest of user based on space-time data know method for distinguishing
CN109918581B (en) * 2019-03-06 2023-09-22 上海评驾科技有限公司 Method for identifying multiple points of interest and multiple results of user based on space-time data
CN111125198A (en) * 2019-12-27 2020-05-08 南京航空航天大学 Computer data mining clustering method based on time sequence

Similar Documents

Publication Publication Date Title
CN108492201B (en) Social network influence maximization method based on community structure
CN100504903C (en) Malevolence code automatic recognition method
Nam et al. Efficient approach for damped window-based high utility pattern mining with list structure
CN106384050B (en) A kind of dynamic stain analysis method excavated based on Maximum Frequent subgraph
Telesca et al. Investigating the time dynamics of seismicity by using the visibility graph approach: Application to seismicity of Mexican subduction zone
CN105631475A (en) Computer data mining and clustering method based on time sequence
CN106294715A (en) A kind of association rule mining method based on attribute reduction and device
WO2015131558A1 (en) Alarm correlation data mining method and device
CN108874952A (en) A kind of Maximal frequent sequential pattern method for digging based on distributed information log
JP7103496B2 (en) Related score calculation system, method and program
CN104484410A (en) Data fusion method and system applied to big data system
CN105095473A (en) Method and system for analyzing discrepant data
CN112364003A (en) Big data management method, device, equipment and medium for different industries
CN104850577A (en) Data flow maximal frequent item set mining method based on ordered composite tree structure
CN114936511A (en) Tailing paste filling design method based on digital twinning
McGowan Ammonoid taxonomic and morphologic recovery patterns after the Permian–Triassic
Tao et al. A new productivity prediction hybrid model for multi-fractured horizontal wells in tight oil reservoirs
CN104765852A (en) Data mining method based on fuzzy algorithm under big data background
CN104484409A (en) Data mining method for big data processing
Bailey et al. Efficient incremental mining of contrast patterns in changing data
CN106326746A (en) Malicious program behavior feature library construction method and device
CN104573481A (en) Password attribute analysis method based on attribute splitting and data mining
CN106021401A (en) Extensible entity analysis algorithm based on reverse indices
Zhao et al. Efficient association rule mining algorithm based on user behavior for cloud security auditing
CN105138926B (en) The effective of sensitive information data hides guard method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160601

WD01 Invention patent application deemed withdrawn after publication