CN105159952A

CN105159952A - Data processing method based on frequent item set mining

Info

Publication number: CN105159952A
Application number: CN201510502478.9A
Authority: CN
Inventors: 任新华; 刘业政; 杜飞; 崔春; 向士庭
Original assignee: ANHUI XINHUABO INFORMATION TECHNOLOGY Co Ltd
Current assignee: ANHUI XINHUABO INFORMATION TECHNOLOGY Co Ltd
Priority date: 2015-08-14
Filing date: 2015-08-14
Publication date: 2015-12-16

Abstract

The present invention provides a data processing method based on frequent item set mining, which comprises the following steps of: acquiring a plurality of items of historical data tables and extracting data tables with value fields; acquiring time sequence data tables and non-time sequence data tables from the data tables with the value fields; carrying out segmentation on the time sequence data tables and carrying out cleaning on the non-time sequence data tables to obtain initial shopping basket data; merging a plurality of data tables in the initial shopping basket data to obtain merged shopping basket data; and respectively carrying out frequent item set mining on the initial shopping basket data and the merged shopping basket data to obtain a frequent item result with a designated support degree. According to the data processing method based on frequent item set mining, historical data is subjected to frequent item set mining to obtain the frequent item set support degree of each historical data table, frequent data in a random dimension can be inquired and the data processing method is convenient for an analyst to acquire the data; and meanwhile, the time sequence data is segmented, which is convenient for the analyst to inquire related data according to a time tag.

Description

Based on the data processing method of frequent item set mining

Technical field

The present invention relates to data query statistics field, particularly a kind of acquisition methods carrying out the frequency of various history field appearance based on frequent item set mining.

Background technology

Along with the development of data mining technology and the development of public business, tradition no longer meets the demand of public business to data query statistical function, in order to obtain obtainable knowledge and the value of frequent appearance in historical data, when the mode of pre-treatment is the frequency occurred by inquiring about various history field.Complicate statistics inquires the frequency etc. that various condition occurs.

The inquiry of existing historical data frequent mode is all based on artificial mode, single inquiry or combine and specify several field to obtain Query Result, and cannot obtain frequent item Query Result for time series data.

Summary of the invention

For solving the problems of the technologies described above, the invention provides a kind of data processing method based on frequent item set mining, comprising the following steps:

Obtain multinomial historical data table, in described each historical data table, extract the tables of data with value field;

Described, there is acquisition time sequence data table and non-time series tables of data in the tables of data of value field;

Time range according to the time division unit preset and described time series data is split described time series data table, carries out cleaning obtain initial market basket data to described non-time series tables of data;

Multiple tables of data in the initial market basket data obtained are merged, obtains the market basket data after merging; Respectively frequent item set mining is carried out to the market basket data after described initial market basket data, merging, obtains all frequent item set results of the appointment support of the market basket data after described initial market basket data, merging.

Preferably, carry out scale removal process to described non-time series tables of data to comprise:

Have the invalid data in the tables of data of value field described in removal, described invalid data comprises misdata, repeating data.

Preferably, carrying out merging mode to the multiple tables of data in market basket data is:

The tables of data having same field in described market basket data is merged.

Preferably, described market basket data form is as follows:

ID1,ITEM11,ITME12,…

ID2,ITEM21,ITEM22,…

Wherein, ID represents the unique designation of tables of data, and ITEM represents field value;

To the mode that the tables of data having same field in described market basket data merges be: attended operation is carried out to the tables of data with identical id field.

Preferably, the step that described time series data is split according to its time range and time division unit is comprised:

Obtain the time range of described time series data, described time range is carried out splitting the time series data after obtaining segmentation according to the time division unit preset.

Preferably, the process that described time series data table is split is comprised:

Obtain the time range of described time series data table, described time range is carried out splitting the time series data after obtaining segmentation according to the time division unit preset.

Preferably, by using Finding Frequent Items to excavate the frequent item set of specifying support.

The present invention has following beneficial effect:

Data processing method based on frequent item set mining provided by the invention utilizes data mining algorithm to carry out frequent item set mining by Finding Frequent Items to historical data, obtain the frequent item set support of each historical data table, avoid manual operation to inquire about sequence historical data, avoid the frequent item set that artificial enquiry occurs more jointly; The frequent data item of any dimension can be inquired, facilitate analyst to the acquisition of data.Time series data is split simultaneously, facilitate analyst according to time tag query-relevant data.

Certainly, implement arbitrary product of the present invention might not need to reach above-described all advantages simultaneously.

Accompanying drawing explanation

In order to be illustrated more clearly in the technical scheme of the embodiment of the present invention, be briefly described describing the required accompanying drawing used to embodiment below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.

The data processing method schematic flow sheet based on frequent item set mining that Fig. 1 provides for the embodiment of the present invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making other embodiments all obtained under creative work prerequisite, belong to the scope of protection of the invention.

As shown in Figure 1, embodiments provide a kind of data processing method based on frequent item set mining, it comprises the following steps:

Wherein, carry out scale removal process to described non-time series tables of data to comprise:

Have the invalid data in the tables of data of value field described in removal, described invalid data comprises misdata, repeating data.The process that described time series data table is split is comprised:

In the present embodiment, merging process is carried out to the tables of data in multiple market basket data and comprises:

The tables of data having same field in described market basket data is merged; Wherein said market basket data form is as follows:

ID1,ITEM11,ITME12,…

ID2,ITEM21,ITEM22,…

To the mode that the tables of data having same field in described market basket data merges be: attended operation is carried out to the tables of data in market basket data with identical id field.Present embodiments provide the operation of emphasis personnel and driver information two tables of data, the present embodiment can select separately the tables of data of emphasis personnel, select association analysis, namely obtain by Frequent Itemsets Mining Algorithm provided by the invention the feature that emphasis personnel the most frequently occur, can be the native place of emphasis personnel, the frequent feature occurred jointly such as emphasis mark reason and native place and emphasis marks reason.Simultaneously, user also can simultaneously selection analysis emphasis personal information tables of data and driver information two tables of data, excavate emphasis personnel and driver merge after frequent item set, the record etc. that the result jointly frequently occurred as emphasis worker labels classification and its car papers type, scores accumulated and emphasis personnel frequently occur jointly.

In the present embodiment, frequent item set refers to the field result of frequent appearance in all records and the result of field associating, the degree that every bar frequent item set has a support frequently to occur to distinguish it, as in history suspect, the ratio that the male sex and Han nationality occur jointly is more than 1%, if our support is set to 1%, and so " man; Han nationality " Here it is frequent 2 collection, the frequent item set result of Here it is the present embodiment needs.Final frequent item set comprises from frequent 1 collection, and frequent 2 collection are to the result meeting support likely occurred.The embodiment of the present invention exceedes all frequent item sets of specifying support by using Finding Frequent Items to excavate.

The present invention also carries out following operation:

There is described in acquisition the sequence data comprising time tag in the tables of data of value field;

According to time range, the time division unit rise time sequence of setting, query time sequence results in the object data of time tag is contained described according to time division unit and time range, and result is filled in described time series, obtain seasonal effect in time series object data.

The disclosed preferred embodiment of the present invention just sets forth the present invention for helping above.Preferred embodiment does not have all details of detailed descriptionthe, does not limit the embodiment that this invention is only described yet.Obviously, according to the content of this instructions, can make many modifications and variations.This instructions is chosen and is specifically described these embodiments, is to explain principle of the present invention and practical application better, thus makes art technician understand well and to utilize the present invention.The present invention is only subject to the restriction of claims and four corner and equivalent.

Claims

1. based on a data processing method for frequent item set mining, it is characterized in that, comprise the following steps:

2., as claimed in claim 1 based on the data processing method of frequent item set mining, it is characterized in that, scale removal process is carried out to described non-time series tables of data and comprises:

3., as claimed in claim 1 based on the data processing method of frequent item set mining, it is characterized in that, carrying out merging mode to the multiple tables of data in market basket data is:

The tables of data having same field in described market basket data is merged.

4., as claimed in claim 3 based on the data processing method of frequent item set mining, it is characterized in that, described market basket data form is as follows:

ID1,ITEM11,ITME12,…

ID2,ITEM21,ITEM22,…

5., as claimed in claim 1 based on the data processing method of frequent item set mining, it is characterized in that, the process that described time series data table is split is comprised:

6. as claimed in claim 1 based on the data processing method of frequent item set mining, it is characterized in that, excavating the frequent item set of specifying support by using Finding Frequent Items.