CN102542335A - Mixed data mining method - Google Patents

Mixed data mining method Download PDF

Info

Publication number
CN102542335A
CN102542335A CN2011101626184A CN201110162618A CN102542335A CN 102542335 A CN102542335 A CN 102542335A CN 2011101626184 A CN2011101626184 A CN 2011101626184A CN 201110162618 A CN201110162618 A CN 201110162618A CN 102542335 A CN102542335 A CN 102542335A
Authority
CN
China
Prior art keywords
attribute
neural network
data
digging
data mining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011101626184A
Other languages
Chinese (zh)
Inventor
严道平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
GUANGZHOU LONGTAI INFORMATION TECHNOLOGY CO LTD
Original Assignee
GUANGZHOU LONGTAI INFORMATION TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by GUANGZHOU LONGTAI INFORMATION TECHNOLOGY CO LTD filed Critical GUANGZHOU LONGTAI INFORMATION TECHNOLOGY CO LTD
Priority to CN2011101626184A priority Critical patent/CN102542335A/en
Publication of CN102542335A publication Critical patent/CN102542335A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

On the aspects of processing a large data volume, eliminating redundant information and the like, the rough set theory has an excellent effect. A neural network has the outstanding characteristics of unique model structure, inherent non-linear analog capability, high self-adaption, high fault tolerance feature and the like. Therefore, the effectively combination of two technologies is a research focus in the field of data mining in recent years. The invention provides a novel mixed data mining method.

Description

A kind of blended data method for digging
Technical field
The invention belongs to computer software fields, particularly a kind of blended data method for digging, and this method is in the application of BO.
Technical background
The development of Along with computer technology; Database technology and data base management system (DBMS) are used increasingly extensive, and data quantity stored sharply increases in the database, in the under cover many behind important information of lot of data; If can from database, extract these information; Will be for company creates a lot of potential profits, and this from high-volume database the technology of mined information, just be referred to as data mining (Data Mining-DM).
Rough Set is a kind of mathematical tool of portraying imperfection and uncertain information, can analyze and handle out of true, various incomplete information such as inconsistent, imperfect effectively, and therefrom find tacit knowledge, discloses potential rule.Rough Set is to observe and to measure the data of gained and the method for classifying is basis, and it thinks that knowledge is based on the ability to object class, and the direct different classification modes relevant with the true or abstract world of knowledge link together.The uncertainty that rough set is used is approximate, information is portrayed on approximate and border down.
Neural network is through respectively connecting the change of weights in the network, realizing information processing and storage.Each neuron is the storage unit of information in neural network; It is again the information processing unit; Information processing and storage unite two into one, and the network that is made up of these neurons is accomplished identification and memory to input pattern under each neuronic acting in conjunction.Artificial neural network distributes storage information with interconnection widely between neuron, comes associated treatment information with non-linear neuron.Therefore, it has MPP, extremely strong robustness and fault-tolerance, very strong self-learning function.
Because rough set and neural network have very strong mutual supplement with each other's advantages property, therefore the effective combination with two kinds of technology is a current research focus, has caused many scholars' extensive concern.
In existing various combinations; The attribute reduction of Rough Set is one of important component part wherein; Through carrying out yojan with its training data to neural network, reduce e-learning desired data amount, reach the further purpose of improving neural network learning efficient and precision.Yet in practical application, for some larger networks, the treatment effeciency of rough set also is worth further investigation.
Summary of the invention
Data volume is big, dimension quick yojan problem for a long time owing to the attribute reduction method based on paralleling genetic algorithm can effectively solve; Thereby can consider at first to carry out fast selecting with its input space to neural network; Use neural networks for data mining on this basis, the efficient when large-scale actual database being excavated with further raising applying rough set and neural network.
Based on above-mentioned analysis, the present invention has designed a kind of blended data method for digging that utilizes Rough Set and neural network.
In order to realize goal of the invention, the know-why of employing is following:
Sample data is analyzed; Form an initial information table according to known domain knowledge then, adopt rational discrete method that connection attribute is carried out discretize, use based on the parallel Algorithm for Reduction of genetic algorithm data are carried out quick attribute reduction (horizontal yojan); With the attribute after the yojan as input layer; Then data are carried out vertical yojan, comprise inconsistent object and the redundant object eliminated in the data, with neural network the data of simplifying after handling are trained at last.The introducing of parallel Algorithm for Reduction can further improve the whole digging efficiency of rough set and neural net method.Processing procedure is as shown in Figure 1.
Description of drawings
Fig. 1 is flow chart of data processing figure of the present invention.
Embodiment
The chief component of this method has:
(1) connection attribute discretize: before with the rough set method data being analyzed; Need be with the continuous variable discretize; Discretize can be summed up as in essence utilizes the breakpoint of choosing to come the problem that the space that conditional attribute constitutes is divided; Be divided into limited zone to n-dimensional space, make that the decision value of the object in each zone is identical.Method commonly used has: apart from division methods, equifrequent division methods, Naive Scaler method etc.
(2) decision table forms: adopt conditional attribute and decision attribute value after quantizing to form object of each line description of two-dimension table, a kind of attribute of each row corresponding objects.
(3) attribute reduction: the process of decision table attribute reduction, from the conditional attribute of decision table system, remove unnecessary conditional attribute exactly, thereby analyze conditional attribute in the resultant yojan for the decision rule of decision attribute.Flow process used herein:
Input: conditional attribute set C={Y11, Y12 ..., Y53}, decision attribute set D={d};
Output: an attribute reduction set REDU
Step1: design conditions attribute C has the positive territory POSC of D (D);
Step2: to attribute Yij ∈ C, calculate to remove its resulting conditional attribute subclass {Yij} the positive territory of D {Yij} (D);
Step3: if {Yij} (D)=POSC (D), then declared attribute Yij is unnecessary for decision attribute d, at this moment {Yij}, change st ep2; Otherwise, output attribute yojan REDU=C.
(4) object yojan: eliminate inconsistent object and redundant object in the data, inconsistent object is the different object of decision attribute for conditional attribute is identical, and redundant object is the also identical object of decision attribute for conditional attribute is identical.
(5) neural network model is confirmed: neural network can be divided into by type: BP network, ART network, RBF network and LVM network etc., this patent adopts the most frequently used BP network.
(6) study of network and check:, select corresponding training data and attribute to network training from initial connection attribute decision table, and test with corresponding test sample book according to the neural network model input.
The concrete practice of this method is following:
The method is applied in the analysis decision of certain supermarket member shopping at present; Be that example describes with member's signature analysis below: Marketing Analysis personnel hoped to analyze in certain period; The client characteristics that client is played a decisive role in the variation of supermarket shopping, and based on this following client's the propensity to consume is predicted.Wherein, the related dimension has age of this theme of client, occupation, income, sex, marital status etc.Under related personnel's help, the data that we chose from member data warehouse, supermarket between year May in January, 2005 to 2006 are analyzed.In conjunction with the actual conditions of available data, comprise the conditional attribute of client's number shopping amount of money rate of change of variant age, income, occupation, sex, marriage in every of the choosing record as input, whole shopping amount of money rate of change is as decision attribute D.As training set, the data in January, 2006 to May are handled according to the model of this paper as test set with the data in year Dec in January, 2005 to 2005; At first carry out the dimension yojan, obtain to the big customer type of shopping amount of money rate of change influence be the age at 30-40, income is at 4000-6000; Occupation is the culture and education industry, and sex is the male sex's married client, utilizes the BP neural network to predict based on this; Adopt the structure of Fig. 1, the neuron excitation function adopts the sigmoid function.The BP neural network of research tradition simultaneously, the predicting the outcome of rough set is with the performance of model that this paper is carried relatively.Through relatively finding out that model that this paper proposes is the precision of prediction that all is higher than other two kinds of network structures at training set or the precision of prediction of test set; This has explained the validity of rough set preprocessing process; The pretreated horizontal yojan of rough set has reduced the scale of network with vertical yojan, thereby has reduced the time and the complicacy of network training and test; Wherein parallel yojan has further improved the counting yield of this paper model.

Claims (4)

1. blended data method for digging is characterized in that two kinds of methods of Rough Set and neural network integrated;
2. a blended data method for digging is characterized in that discrete method carries out discretize to connection attribute, is subtracting based on genetic algorithm approximately;
3. a blended data method for digging is characterized in that with neural network the data of simplifying after handling being trained.The introducing of parallel Algorithm for Reduction can further improve the whole digging efficiency of rough set and neural net method;
4. according to claim 2,3 described blended data method for digging, it is characterized in that the BP neural network that adopts.
CN2011101626184A 2011-06-16 2011-06-16 Mixed data mining method Pending CN102542335A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011101626184A CN102542335A (en) 2011-06-16 2011-06-16 Mixed data mining method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011101626184A CN102542335A (en) 2011-06-16 2011-06-16 Mixed data mining method

Publications (1)

Publication Number Publication Date
CN102542335A true CN102542335A (en) 2012-07-04

Family

ID=46349181

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011101626184A Pending CN102542335A (en) 2011-06-16 2011-06-16 Mixed data mining method

Country Status (1)

Country Link
CN (1) CN102542335A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103076740A (en) * 2012-12-18 2013-05-01 江苏大学 Construction method for AC (alternating current) electromagnetic levitation spindle controller
CN104298873A (en) * 2014-10-10 2015-01-21 浙江大学 Attribute reduction method and mental state assessment method on the basis of genetic algorithm and rough set
CN105488697A (en) * 2015-12-09 2016-04-13 焦点科技股份有限公司 Potential customer mining method based on customer behavior characteristics
CN108632929A (en) * 2018-04-16 2018-10-09 北京京大律业知识产权代理有限公司 A kind of big data polymerization towards quick service
CN109358900A (en) * 2016-04-15 2019-02-19 北京中科寒武纪科技有限公司 The artificial neural network forward operation device and method for supporting discrete data to indicate

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101187803A (en) * 2007-12-06 2008-05-28 宁波思华数据技术有限公司 Ammonia converter production optimization method based on data excavation technology
CN101963983A (en) * 2010-09-28 2011-02-02 江苏瑞蚨通软件科技有限公司(中外合资) Data mining method of rough set and optimization neural network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101187803A (en) * 2007-12-06 2008-05-28 宁波思华数据技术有限公司 Ammonia converter production optimization method based on data excavation technology
CN101963983A (en) * 2010-09-28 2011-02-02 江苏瑞蚨通软件科技有限公司(中外合资) Data mining method of rough set and optimization neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DONG LI XIN: "Rough set and radial basis function neural network based insulation data mining fault diagnosis for power transformer", 《JOURNAL OF HARBIN INSTITUTE OF TECHNOLOGY》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103076740A (en) * 2012-12-18 2013-05-01 江苏大学 Construction method for AC (alternating current) electromagnetic levitation spindle controller
CN103076740B (en) * 2012-12-18 2015-10-28 江苏大学 Exchange the building method of motorized spindle supported with AMB controller
CN104298873A (en) * 2014-10-10 2015-01-21 浙江大学 Attribute reduction method and mental state assessment method on the basis of genetic algorithm and rough set
CN104298873B (en) * 2014-10-10 2017-06-06 浙江大学 A kind of attribute reduction method and state of mind appraisal procedure based on genetic algorithm and rough set
CN105488697A (en) * 2015-12-09 2016-04-13 焦点科技股份有限公司 Potential customer mining method based on customer behavior characteristics
CN109358900A (en) * 2016-04-15 2019-02-19 北京中科寒武纪科技有限公司 The artificial neural network forward operation device and method for supporting discrete data to indicate
CN109358900B (en) * 2016-04-15 2020-07-03 中科寒武纪科技股份有限公司 Artificial neural network forward operation device and method supporting discrete data representation
CN108632929A (en) * 2018-04-16 2018-10-09 北京京大律业知识产权代理有限公司 A kind of big data polymerization towards quick service
CN108632929B (en) * 2018-04-16 2021-08-17 上海识装信息科技有限公司 Big data aggregation method for quick service

Similar Documents

Publication Publication Date Title
CN101963983A (en) Data mining method of rough set and optimization neural network
Jiang et al. Dynamic linkages among global oil market, agricultural raw material markets and metal markets: an application of wavelet and copula approaches
Sun et al. Data mining method for listed companies’ financial distress prediction
CN108764584B (en) Enterprise electric energy substitution potential evaluation method
WO2021088499A1 (en) False invoice issuing identification method and system based on dynamic network representation
CN110674970A (en) Enterprise legal risk early warning method, device, equipment and readable storage medium
CN103984714B (en) Ontology semantics-based supply and demand matching method for cloud manufacturing service
CN104537433A (en) Sold electricity quantity prediction method based on inventory capacities and business expansion characteristics
CN111738843B (en) Quantitative risk evaluation system and method using running water data
CN102542335A (en) Mixed data mining method
Zhou et al. A novel grey seasonal model based on cycle accumulation generation for forecasting energy consumption in China
Xia et al. A DEA-based empirical analysis for dynamic performance of China's regional coke production chain
Guo et al. A class of multi-period semi-variance portfolio for petroleum exploration and development
Wang et al. The construction and empirical analysis of the company’s financial early warning model based on data mining algorithms
CN113283806A (en) Enterprise information evaluation method and device, computer equipment and storage medium
Yu et al. Decision tree method in financial analysis of listed logistics companies
Yu et al. Computational intelligent data analysis for sustainable development
Feng Data Analysis and Prediction Modeling Based on Deep Learning in E‐Commerce
Rahman et al. To predict customer churn by using different algorithms
Huang et al. Hysteresis effects of R&D expenditures and patents on firm performance: An empirical study of Hsinchu Science Park in Taiwan
Pei et al. A Predictive Analysis of the Business Environment of Economies along the Belt and Road Using the Fractional‐Order Grey Model
WO2022143431A1 (en) Method and apparatus for training anti-money laundering model
Yang et al. Reform and competitive selection in China: An analysis of firm exits
Wang et al. Future of jobs in China under the impact of artificial intelligence
Mukhtar et al. Forecasting Covid-19 time series data using the long short-term memory (LSTM)

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
DD01 Delivery of document by public notice

Addressee: Guangzhou Longtai Information Technology Co.,Ltd.

Document name: Notification that Application Deemed to be Withdrawn

C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20120704