CN105159952A - Data processing method based on frequent item set mining - Google Patents

Data processing method based on frequent item set mining Download PDF

Info

Publication number
CN105159952A
CN105159952A CN201510502478.9A CN201510502478A CN105159952A CN 105159952 A CN105159952 A CN 105159952A CN 201510502478 A CN201510502478 A CN 201510502478A CN 105159952 A CN105159952 A CN 105159952A
Authority
CN
China
Prior art keywords
data
tables
item set
frequent item
market basket
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510502478.9A
Other languages
Chinese (zh)
Inventor
任新华
刘业政
杜飞
崔春
向士庭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ANHUI XINHUABO INFORMATION TECHNOLOGY Co Ltd
Original Assignee
ANHUI XINHUABO INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ANHUI XINHUABO INFORMATION TECHNOLOGY Co Ltd filed Critical ANHUI XINHUABO INFORMATION TECHNOLOGY Co Ltd
Priority to CN201510502478.9A priority Critical patent/CN105159952A/en
Publication of CN105159952A publication Critical patent/CN105159952A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention provides a data processing method based on frequent item set mining, which comprises the following steps of: acquiring a plurality of items of historical data tables and extracting data tables with value fields; acquiring time sequence data tables and non-time sequence data tables from the data tables with the value fields; carrying out segmentation on the time sequence data tables and carrying out cleaning on the non-time sequence data tables to obtain initial shopping basket data; merging a plurality of data tables in the initial shopping basket data to obtain merged shopping basket data; and respectively carrying out frequent item set mining on the initial shopping basket data and the merged shopping basket data to obtain a frequent item result with a designated support degree. According to the data processing method based on frequent item set mining, historical data is subjected to frequent item set mining to obtain the frequent item set support degree of each historical data table, frequent data in a random dimension can be inquired and the data processing method is convenient for an analyst to acquire the data; and meanwhile, the time sequence data is segmented, which is convenient for the analyst to inquire related data according to a time tag.

Description

Based on the data processing method of frequent item set mining
Technical field
The present invention relates to data query statistics field, particularly a kind of acquisition methods carrying out the frequency of various history field appearance based on frequent item set mining.
Background technology
Along with the development of data mining technology and the development of public business, tradition no longer meets the demand of public business to data query statistical function, in order to obtain obtainable knowledge and the value of frequent appearance in historical data, when the mode of pre-treatment is the frequency occurred by inquiring about various history field.Complicate statistics inquires the frequency etc. that various condition occurs.
The inquiry of existing historical data frequent mode is all based on artificial mode, single inquiry or combine and specify several field to obtain Query Result, and cannot obtain frequent item Query Result for time series data.
Summary of the invention
For solving the problems of the technologies described above, the invention provides a kind of data processing method based on frequent item set mining, comprising the following steps:
Obtain multinomial historical data table, in described each historical data table, extract the tables of data with value field;
Described, there is acquisition time sequence data table and non-time series tables of data in the tables of data of value field;
Time range according to the time division unit preset and described time series data is split described time series data table, carries out cleaning obtain initial market basket data to described non-time series tables of data;
Multiple tables of data in the initial market basket data obtained are merged, obtains the market basket data after merging; Respectively frequent item set mining is carried out to the market basket data after described initial market basket data, merging, obtains all frequent item set results of the appointment support of the market basket data after described initial market basket data, merging.
Preferably, carry out scale removal process to described non-time series tables of data to comprise:
Have the invalid data in the tables of data of value field described in removal, described invalid data comprises misdata, repeating data.
Preferably, carrying out merging mode to the multiple tables of data in market basket data is:
The tables of data having same field in described market basket data is merged.
Preferably, described market basket data form is as follows:
ID1,ITEM11,ITME12,…
ID2,ITEM21,ITEM22,…
Wherein, ID represents the unique designation of tables of data, and ITEM represents field value;
To the mode that the tables of data having same field in described market basket data merges be: attended operation is carried out to the tables of data with identical id field.
Preferably, the step that described time series data is split according to its time range and time division unit is comprised:
Obtain the time range of described time series data, described time range is carried out splitting the time series data after obtaining segmentation according to the time division unit preset.
Preferably, the process that described time series data table is split is comprised:
Obtain the time range of described time series data table, described time range is carried out splitting the time series data after obtaining segmentation according to the time division unit preset.
Preferably, by using Finding Frequent Items to excavate the frequent item set of specifying support.
The present invention has following beneficial effect:
Data processing method based on frequent item set mining provided by the invention utilizes data mining algorithm to carry out frequent item set mining by Finding Frequent Items to historical data, obtain the frequent item set support of each historical data table, avoid manual operation to inquire about sequence historical data, avoid the frequent item set that artificial enquiry occurs more jointly; The frequent data item of any dimension can be inquired, facilitate analyst to the acquisition of data.Time series data is split simultaneously, facilitate analyst according to time tag query-relevant data.
Certainly, implement arbitrary product of the present invention might not need to reach above-described all advantages simultaneously.
Accompanying drawing explanation
In order to be illustrated more clearly in the technical scheme of the embodiment of the present invention, be briefly described describing the required accompanying drawing used to embodiment below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
The data processing method schematic flow sheet based on frequent item set mining that Fig. 1 provides for the embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making other embodiments all obtained under creative work prerequisite, belong to the scope of protection of the invention.
As shown in Figure 1, embodiments provide a kind of data processing method based on frequent item set mining, it comprises the following steps:
Obtain multinomial historical data table, in described each historical data table, extract the tables of data with value field;
Described, there is acquisition time sequence data table and non-time series tables of data in the tables of data of value field;
Time range according to the time division unit preset and described time series data is split described time series data table, carries out cleaning obtain initial market basket data to described non-time series tables of data;
Multiple tables of data in the initial market basket data obtained are merged, obtains the market basket data after merging; Respectively frequent item set mining is carried out to the market basket data after described initial market basket data, merging, obtains all frequent item set results of the appointment support of the market basket data after described initial market basket data, merging.
Wherein, carry out scale removal process to described non-time series tables of data to comprise:
Have the invalid data in the tables of data of value field described in removal, described invalid data comprises misdata, repeating data.The process that described time series data table is split is comprised:
Obtain the time range of described time series data table, described time range is carried out splitting the time series data after obtaining segmentation according to the time division unit preset.
In the present embodiment, merging process is carried out to the tables of data in multiple market basket data and comprises:
The tables of data having same field in described market basket data is merged; Wherein said market basket data form is as follows:
ID1,ITEM11,ITME12,…
ID2,ITEM21,ITEM22,…
Wherein, ID represents the unique designation of tables of data, and ITEM represents field value;
To the mode that the tables of data having same field in described market basket data merges be: attended operation is carried out to the tables of data in market basket data with identical id field.Present embodiments provide the operation of emphasis personnel and driver information two tables of data, the present embodiment can select separately the tables of data of emphasis personnel, select association analysis, namely obtain by Frequent Itemsets Mining Algorithm provided by the invention the feature that emphasis personnel the most frequently occur, can be the native place of emphasis personnel, the frequent feature occurred jointly such as emphasis mark reason and native place and emphasis marks reason.Simultaneously, user also can simultaneously selection analysis emphasis personal information tables of data and driver information two tables of data, excavate emphasis personnel and driver merge after frequent item set, the record etc. that the result jointly frequently occurred as emphasis worker labels classification and its car papers type, scores accumulated and emphasis personnel frequently occur jointly.
In the present embodiment, frequent item set refers to the field result of frequent appearance in all records and the result of field associating, the degree that every bar frequent item set has a support frequently to occur to distinguish it, as in history suspect, the ratio that the male sex and Han nationality occur jointly is more than 1%, if our support is set to 1%, and so " man; Han nationality " Here it is frequent 2 collection, the frequent item set result of Here it is the present embodiment needs.Final frequent item set comprises from frequent 1 collection, and frequent 2 collection are to the result meeting support likely occurred.The embodiment of the present invention exceedes all frequent item sets of specifying support by using Finding Frequent Items to excavate.
The present invention also carries out following operation:
There is described in acquisition the sequence data comprising time tag in the tables of data of value field;
According to time range, the time division unit rise time sequence of setting, query time sequence results in the object data of time tag is contained described according to time division unit and time range, and result is filled in described time series, obtain seasonal effect in time series object data.
Data processing method based on frequent item set mining provided by the invention utilizes data mining algorithm to carry out frequent item set mining by Finding Frequent Items to historical data, obtain the frequent item set support of each historical data table, avoid manual operation to inquire about sequence historical data, avoid the frequent item set that artificial enquiry occurs more jointly; The frequent data item of any dimension can be inquired, facilitate analyst to the acquisition of data.Time series data is split simultaneously, facilitate analyst according to time tag query-relevant data.
The disclosed preferred embodiment of the present invention just sets forth the present invention for helping above.Preferred embodiment does not have all details of detailed descriptionthe, does not limit the embodiment that this invention is only described yet.Obviously, according to the content of this instructions, can make many modifications and variations.This instructions is chosen and is specifically described these embodiments, is to explain principle of the present invention and practical application better, thus makes art technician understand well and to utilize the present invention.The present invention is only subject to the restriction of claims and four corner and equivalent.

Claims (6)

1. based on a data processing method for frequent item set mining, it is characterized in that, comprise the following steps:
Obtain multinomial historical data table, in described each historical data table, extract the tables of data with value field;
Described, there is acquisition time sequence data table and non-time series tables of data in the tables of data of value field;
Time range according to the time division unit preset and described time series data is split described time series data table, carries out cleaning obtain initial market basket data to described non-time series tables of data;
Multiple tables of data in the initial market basket data obtained are merged, obtains the market basket data after merging; Respectively frequent item set mining is carried out to the market basket data after described initial market basket data, merging, obtains all frequent item set results of the appointment support of the market basket data after described initial market basket data, merging.
2., as claimed in claim 1 based on the data processing method of frequent item set mining, it is characterized in that, scale removal process is carried out to described non-time series tables of data and comprises:
Have the invalid data in the tables of data of value field described in removal, described invalid data comprises misdata, repeating data.
3., as claimed in claim 1 based on the data processing method of frequent item set mining, it is characterized in that, carrying out merging mode to the multiple tables of data in market basket data is:
The tables of data having same field in described market basket data is merged.
4., as claimed in claim 3 based on the data processing method of frequent item set mining, it is characterized in that, described market basket data form is as follows:
ID1,ITEM11,ITME12,…
ID2,ITEM21,ITEM22,…
Wherein, ID represents the unique designation of tables of data, and ITEM represents field value;
To the mode that the tables of data having same field in described market basket data merges be: attended operation is carried out to the tables of data with identical id field.
5., as claimed in claim 1 based on the data processing method of frequent item set mining, it is characterized in that, the process that described time series data table is split is comprised:
Obtain the time range of described time series data table, described time range is carried out splitting the time series data after obtaining segmentation according to the time division unit preset.
6. as claimed in claim 1 based on the data processing method of frequent item set mining, it is characterized in that, excavating the frequent item set of specifying support by using Finding Frequent Items.
CN201510502478.9A 2015-08-14 2015-08-14 Data processing method based on frequent item set mining Pending CN105159952A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510502478.9A CN105159952A (en) 2015-08-14 2015-08-14 Data processing method based on frequent item set mining

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510502478.9A CN105159952A (en) 2015-08-14 2015-08-14 Data processing method based on frequent item set mining

Publications (1)

Publication Number Publication Date
CN105159952A true CN105159952A (en) 2015-12-16

Family

ID=54800808

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510502478.9A Pending CN105159952A (en) 2015-08-14 2015-08-14 Data processing method based on frequent item set mining

Country Status (1)

Country Link
CN (1) CN105159952A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512322A (en) * 2015-12-18 2016-04-20 中国农业银行股份有限公司 Frequent item set generating method and device
CN105631709A (en) * 2015-12-26 2016-06-01 深圳大学 Shopping basket analysis method and system
CN106371917A (en) * 2016-08-23 2017-02-01 清华大学 Real-time frequent item set mining-oriented acceleration system and method
CN107798021A (en) * 2016-09-07 2018-03-13 北京京东尚科信息技术有限公司 Data correlation processing method, system and electronic equipment
CN108346007A (en) * 2018-03-02 2018-07-31 深圳灵虎至真智能科技有限公司 A kind of mobile phone labeling detection data analysis method based on FP-Growth algorithms
CN111767277A (en) * 2020-07-08 2020-10-13 深延科技(北京)有限公司 Data processing method and device
CN114218263A (en) * 2022-02-23 2022-03-22 浙江一山智慧医疗研究有限公司 Automatic creation method of materialized view and rapid query method based on materialized view

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102142992A (en) * 2011-01-11 2011-08-03 浪潮通信信息***有限公司 Communication alarm frequent itemset mining engine and redundancy processing method
US20140032514A1 (en) * 2012-07-25 2014-01-30 Wen-Syan Li Association acceleration for transaction databases

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102142992A (en) * 2011-01-11 2011-08-03 浪潮通信信息***有限公司 Communication alarm frequent itemset mining engine and redundancy processing method
US20140032514A1 (en) * 2012-07-25 2014-01-30 Wen-Syan Li Association acceleration for transaction databases

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
刘锡铃: "关联规则挖掘算法及其在购物篮分析中的应用研究", 《中国优秀硕士学位论文全文数据库》 *
吕刚: "犯罪情报信息的数据挖掘技术研究及应用实现", 《中国优秀硕士学位论文全文数据库》 *
柴明亮: "关联规则在时间序列数据挖掘中的应用", 《中国优秀博硕士学位论文全文数据库》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512322A (en) * 2015-12-18 2016-04-20 中国农业银行股份有限公司 Frequent item set generating method and device
CN105512322B (en) * 2015-12-18 2019-02-15 中国农业银行股份有限公司 The generation method and device of frequent item set
CN105631709A (en) * 2015-12-26 2016-06-01 深圳大学 Shopping basket analysis method and system
CN106371917A (en) * 2016-08-23 2017-02-01 清华大学 Real-time frequent item set mining-oriented acceleration system and method
CN106371917B (en) * 2016-08-23 2019-07-02 清华大学 Acceleration system and method towards real-time frequent item set mining
CN107798021A (en) * 2016-09-07 2018-03-13 北京京东尚科信息技术有限公司 Data correlation processing method, system and electronic equipment
CN108346007A (en) * 2018-03-02 2018-07-31 深圳灵虎至真智能科技有限公司 A kind of mobile phone labeling detection data analysis method based on FP-Growth algorithms
CN111767277A (en) * 2020-07-08 2020-10-13 深延科技(北京)有限公司 Data processing method and device
CN114218263A (en) * 2022-02-23 2022-03-22 浙江一山智慧医疗研究有限公司 Automatic creation method of materialized view and rapid query method based on materialized view

Similar Documents

Publication Publication Date Title
CN105159952A (en) Data processing method based on frequent item set mining
CN102999526B (en) A kind of fractionation of database relational table, querying method and system
CN105404627B (en) It is a kind of for determining the method and apparatus of search result
CN105677728A (en) Object image recognition and classification managing method
WO2018039772A8 (en) Real-time document filtering systems and methods
WO2014205273A3 (en) System and method for providing real-time tracking of items in a distribution network
CN106897285B (en) Data element extraction and analysis system and data element extraction and analysis method
CN104598569A (en) Association rule-based MBD (Model Based Definition) data set completeness checking method
WO2016018471A8 (en) Personalized search based on similarity
CN109213921A (en) A kind of searching method and device of merchandise news
CN106649516A (en) A large data processing method for educational resources
CN103226610B (en) Database table querying method and device
CN110413708B (en) Data analysis system oriented to business terms
MX2020001651A (en) Diversity evaluation in genealogy search.
Orman et al. A method for characterizing communities in dynamic attributed complex networks
CN104318675A (en) Automatic snack bar and using method thereof
CN107832769A (en) Object is located at the visual identity method in environment
CN104463627A (en) Data processing method and device
CN107273389A (en) The querying method and device of trial video
MX2023008984A (en) Systems and methods for accessing data entities managed by a data processing system.
CN108109009A (en) A kind of commodity market display method based on big data analysis
CN105095225A (en) Method and apparatus for obtaining file data
CN102722529A (en) Business information query system
KR20170055379A (en) Purchase price forecasting methods for new developments utilizing the ERP database
CN105488144A (en) Method for processing repeated information in product review information

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20151216

RJ01 Rejection of invention patent application after publication