CN103853821B - Method for constructing multiuser collaboration oriented data mining platform - Google Patents

Method for constructing multiuser collaboration oriented data mining platform Download PDF

Info

Publication number
CN103853821B
CN103853821B CN201410059806.8A CN201410059806A CN103853821B CN 103853821 B CN103853821 B CN 103853821B CN 201410059806 A CN201410059806 A CN 201410059806A CN 103853821 B CN103853821 B CN 103853821B
Authority
CN
China
Prior art keywords
data
component
user
mining
implement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410059806.8A
Other languages
Chinese (zh)
Other versions
CN103853821A (en
Inventor
叶枫
郭小成
李源畅
范仕良
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohai University HHU
Original Assignee
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohai University HHU filed Critical Hohai University HHU
Priority to CN201410059806.8A priority Critical patent/CN103853821B/en
Publication of CN103853821A publication Critical patent/CN103853821A/en
Application granted granted Critical
Publication of CN103853821B publication Critical patent/CN103853821B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method for constructing a multiuser collaboration oriented data mining platform. According to the method, flexible workflow and a multiuser collaboration mechanism are integrated, a working space oriented to the collaborated data mining of three kinds of user roles, namely data acquisition staff, data analysis staff and result auditing staff, is provided, and the whole work flow is realized by components comprising a data acquisition component, a data preprocessing component, a data modeling component, a result visualization display component and a model evaluation component. The flexible workflow formed by the components and arrows can be created and manipulated by different use roles in different user views in a drag-and-drop manner. Aiming at the complexity, namely continuous repeating, continuous revising and continuous iterating, of data mining, the method has the advantages that data mining work can be greatly simplified, and data can be prevented from leaking so as to guarantee the safety of the data.

Description

A kind of construction method of the data mining platform of facing multiple users cooperation
Technical field
The present invention relates to a kind of construction method of the data mining platform of integrated resiliency workflow, facing multiple users cooperation, Belong to data mining technology field.
Background technology
Data mining (data mining) be a kind of from the history service data of magnanimity, carry through mathematical analysis pattern Take out and contain in the process of potential information therein.Data mining is constantly repetition, constantly modification, a mistake for continuous iteration Journey, main inclusion:Data acquisition, data prediction, data analysiss, result visualization show and the flow process such as model evaluation.At present, Data mining is widely used in commercial fields such as bank, telecommunications, insurance, traffic, retails.
, there is problems with existing data mining platform:Lack revocable, can reform, preservable elasticity user's work Make space so that user must settle at one go when carrying out data mining, bring inconvenience;Lack can change, can iteration, can The procedure component of intermediate result output is so that user can not be best understood by and manipulate its data analysis process;Towards alone The excavation mechanism at family so that user set data collector, data analyst, three roles of result audit crew, Cannot be cooperated in whole analysis process, be also easy to lead to leaking of data and analysis result, cause problem of data safety.
Content of the invention
Goal of the invention:For problems of the prior art, the present invention provides one kind to be related to elastic working stream, multi-user The construction method of the data mining platform of cooperation.
There is provided a kind of can be revoked, can reform, can preserve based on Web by the data mining platform that the inventive method builds Elastic user workspace.In user workspace, data acquisition personnel can upload, update, delete data set;Data Analysis personnel can set up and manipulate the data analysiss flow process of oneself;Result audit crew can carry out Result examination and Reply.
Technical scheme:A kind of construction method of the data mining platform of facing multiple users cooperation, provides a kind of data-oriented Collector, data analyst and the three kinds of user role cooperations of result audit crew carry out the work space of data mining, whole Individual workflow is realized with component, including:Data acquisition component, data prediction component, data modeling component, result visualization Display member and model evaluation component.Different user roles uses different Users, it is possible to use the mode of dragging is built Stand and operate the data analysiss flow process of oneself, described data acquisition personnel carry out the upper of data by described data acquisition component Pass, update and deletion action, described data analyst is pressed flow process order and utilized data prediction component, data modeling structure successively Part, result visualization component and model evaluation component carry out the data such as data acquisition, data prediction, modeling, model evaluation and divide Analysis operation, described result audit crew is entered to Result by described result visualization component in described user workspace Row examines and gives an written reply.
Described user workspace is the pattern manipulation interface of a towed, including:Prioritizing component area and flow process Create two, area part, described prioritizing component area is a series of region displaying type of extension data digging flow components, Described flow process creates the region that area is that user sets up and manipulate data analysiss flow process.
Described data analysiss flow process is a kind of elastic working stream being made up of component and arrow.In the analysis of any one data In flow process, during user can be adjusted execution parameter on component node, change flow performing direction at any time and derive Between the operation such as operation result.
Data mining platform construction method comprises the following steps;
Step 1:Design and Implement data acquisition component.Carry out data acquisition in the following two cases:In data base Gathered data and web upload mode gathered data.
Gathered data in data base, is connected by Java data base and realizes, and the data access of data mining platform is real When be converted in data base corresponding data query.
Web uploads mode gathered data, by monitoring the data upload requests of web client, sets up client data The socket of storage server connects, and reuses the file system that Java I/O stream writes a dataset into data storage server In.
When two kinds of data acquisition components implement, all need to be saved in the metadata information of corresponding for data set data In the data base of system, and externally provide unified access interface.
Step 2:Design and Implement data prediction component.Statistical analysiss are carried out to data set by R language, with figure Mode is to the basic description information of user's demonstrating data collection;Encapsulation interpolation is filled up, is recorded the mathematical method removing data correction, Processing data missing values are provided, process repeated data, process noise data and process the data prediction links such as abnormal data User interface.
Step 3:Design and Implement data modeling component.By the encapsulation classification of R language, cluster, association and time serieses etc. Data mining model;There is provided graphical interfaces interface to user setup corresponding model analysiss parameter.
Step 4:Design and Implement result visualization display member.By R language by data mining results and model evaluation Result is presented to user in modes such as figure, lists;It is pushed to result audit crew during by Ajax polling technique by fructufy.
Step 5:Design and Implement model evaluation component.There is provided accurate rate, error rate and confusion matrix by using R language Etc. multiple model evaluation methods;The user that model analysiss parameter and model metadata information are saved in system database is provided to connect Mouthful.
Step 6:Design and Implement user workspace.Realize the pattern manipulation interface of a towed by JQuery, Create two, area part including component prioritizing component area and flow process;Store User operation log by stack data structures, The user interface provide revocation, reforming and saving workspace.
Step 7:Define and realize data digging flow.The data mining component being designed with step 1 to step 5 as node, Define the workflow being made up of several nodes and arrow;There is provided adjustment node execution parameter, change flow performing direction and Derive the user interfaces such as intermediate calculation results.
Step 8:Integrated and deployment Mining Platform.The data mining component that step 1 to step 5 is designed provides JSON form Configuration interface, the user interface of the function of customizing Mining Platform in the way of editing configuration file is provided.
The present invention adopts technique scheme, has the advantages that:For the continuous repetition of data mining, constantly repair Change, the complexity of continuous iteration, there is provided a kind of elastic data excacation space of facing multiple users cooperation.Not only can pole Big simplification data mining work, be also prevented from data leaks it is ensured that the safety of data.
Brief description
Fig. 1 is the structural principle block diagram of the facing multiple users data mining platform of the embodiment of the present invention.
Specific embodiment
With reference to specific embodiment, it is further elucidated with the present invention it should be understood that these embodiments are merely to illustrate the present invention Rather than restriction the scope of the present invention, after having read the present invention, the various equivalences to the present invention for the those skilled in the art The modification of form all falls within the application claims limited range.
In the embodiment of the present invention, data mining platform construction method comprises the following steps;
Step 1:Design and Implement data acquisition component.For the big quantization (volume) of data set, variation (variety) and the complex characteristics such as rapid (velocity), it is divided into following two situations to implement:Data base gathers Data and web upload mode gathered data.
Gathered data in data base, connects (JDBC) by Java data base and realizes, by the data of data mining platform Access and be converted in data base corresponding data query SQL in real time.
Web uploads mode gathered data, by monitoring the data upload requests of web client, sets up client data The socket of storage server connects, and reuses the file system that Java I/O stream writes a dataset into data storage server In.
When two kinds of data acquisition components implement, all need to be saved in the metadata information of corresponding for data set data In the data base of system, and externally provide unified access interface.
Step 2:Design and Implement data prediction component.Statistical analysiss are carried out to data set by R language, with figure Mode is to the basic description information of user's demonstrating data collection;Encapsulation interpolation is filled up, is recorded the mathematical method removing data correction, Processing data missing values are provided, process repeated data, process noise data and process the data prediction links such as abnormal data User interface.
Step 3:Design and Implement data modeling component.By the encapsulation classification of R language, cluster, association and time serieses etc. Data mining model;There is provided graphical interfaces interface to user setup corresponding model analysiss parameter.
Step 4:Design and Implement result visualization display member.By R language by data mining results and model evaluation Result is presented to user in modes such as figure, lists;It is pushed to result audit crew during by Ajax polling technique by fructufy.
Step 5:Design and Implement model evaluation component.By R language, the model establishing before is estimated;There is provided Model analysiss parameter and model metadata information are saved in the user interface of system database.
Step 6:Design and Implement user workspace.Realize the pattern manipulation interface of a towed by JQuery, Create two, area part including component prioritizing component area and flow process;Store User operation log by stack data structures, The user interface provide revocation, reforming and saving workspace.
Step 7:Define and realize data digging flow.The data mining component being designed with step 1 to step 5 as node, Define the workflow being made up of several nodes and arrow;There is provided adjustment node execution parameter, change flow performing direction and Derive the user interfaces such as intermediate calculation results.
Step 8:Integrated and deployment Mining Platform.The data mining component that step 1 to step 5 is designed provides JSON form Configuration interface, the user interface of the function of customizing Mining Platform in the way of editing configuration file is provided.
As shown in figure 1, data mining platform data-oriented collector according to the present invention, data analyst and result Three kinds of user roles of audit crew carry out collaboration data excavation, and provide a kind of user workspace of componentization, including data Acquisition member, data prediction component, data modeling component, result visualization display member and model evaluation component.
Different user roles uses different Users, it is possible to use the number of oneself is set up and operated to the mode of dragging According to analysis process, data acquisition personnel carry out the upload of data, renewal and deletion action, described data by data acquisition component Analysis personnel are commented using data prediction component, data modeling component, result visualization component and model successively by flow process order Estimate component and carry out the data analysis operation such as data acquisition, data prediction, modeling, model evaluation, result audit crew is in user By result visualization component, Result is examined in work space and given an written reply.
Data analysiss flow process is a kind of elastic working stream being made up of component and arrow.In any one data analysis process On, transport in the middle of execution parameter, change flow performing direction and the derivation that user can be adjusted on component node at any time Calculate the operation such as result.

Claims (1)

1. a kind of data mining platform of facing multiple users cooperation construction method it is characterised in that:A kind of data-oriented is provided Collector, data analyst and the three kinds of user role cooperations of result audit crew carry out the work space of data mining, tool Body comprises the following steps:
Step 1:Design and Implement data acquisition component:Carry out data acquisition in the following two cases:Data base gathers Data and web upload mode gathered data;
Gathered data in data base, is connected by Java data base and realizes, by the data access of data mining platform in real time It is converted in data base corresponding data query;
Web uploads mode gathered data, by monitoring the data upload requests of web client, sets up the storage of client data The socket of server connects, and reuses Java I/O stream and writes a dataset in the file system of data storage server;
When two kinds of data acquisition components implement, all need for the metadata information of corresponding for data set data to be saved in system Data base in, and externally provide unified access interface;
Step 2:Design and Implement data prediction component:Statistical analysiss are carried out to data set by R language, graphically Basic description information to user's demonstrating data collection;Encapsulation interpolation is filled up, is recorded the mathematical method removing data correction, provides Processing data missing values, the user processing repeated data, processing noise data and process the data prediction link of abnormal data Interface;
Step 3:Design and Implement data modeling component:By the encapsulation classification of R language, cluster, association and seasonal effect in time series data Mining model;There is provided graphical interfaces interface to user setup corresponding model analysiss parameter;
Step 4:Design and Implement result visualization display member:By R language by data mining results and model evaluation result It is presented to user in the way of figure, list;It is pushed to result audit crew during by Ajax polling technique by fructufy;
Step 5:Design and Implement model evaluation component:There is provided the many of accurate rate, error rate and confusion matrix by using R language Plant model evaluation method;User interface model analysiss parameter and model metadata information being saved in system database is provided;
Step 6:Design and Implement user workspace:Realize the pattern manipulation interface of a towed by JQuery, including Component prioritizing component area and flow process create two, area part;Store User operation log by stack data structures, provide Revocation, the user interface reformed and save workspace;
Step 7:Define and realize data digging flow:The data mining component being designed with step 1 to step 5 as node, definition The workflow being made up of several nodes and arrow;Adjustment node execution parameter, change flow performing direction and derivation are provided The user interface of intermediate calculation results;
Step 8:Integrated and deployment Mining Platform:The data mining component that step 1 to step 5 is designed provides joining of JSON form Put interface, the user interface of the function of customizing Mining Platform in the way of editing configuration file is provided.
CN201410059806.8A 2014-02-21 2014-02-21 Method for constructing multiuser collaboration oriented data mining platform Active CN103853821B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410059806.8A CN103853821B (en) 2014-02-21 2014-02-21 Method for constructing multiuser collaboration oriented data mining platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410059806.8A CN103853821B (en) 2014-02-21 2014-02-21 Method for constructing multiuser collaboration oriented data mining platform

Publications (2)

Publication Number Publication Date
CN103853821A CN103853821A (en) 2014-06-11
CN103853821B true CN103853821B (en) 2017-02-22

Family

ID=50861476

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410059806.8A Active CN103853821B (en) 2014-02-21 2014-02-21 Method for constructing multiuser collaboration oriented data mining platform

Country Status (1)

Country Link
CN (1) CN103853821B (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572929A (en) * 2014-12-26 2015-04-29 深圳市科漫达智能管理科技有限公司 Data mining method and device
CN104731953A (en) * 2015-03-31 2015-06-24 河海大学 R-based building method of data preprocessing system
CN105159688A (en) * 2015-10-14 2015-12-16 浙江大学 Programmable information visualization interaction type design method
CN105468736A (en) * 2015-11-23 2016-04-06 国云科技股份有限公司 Plug-in and component based data preprocessing system and realization method therefor
CN105550365A (en) * 2016-01-15 2016-05-04 中国科学院自动化研究所 Visualization analysis system based on text topic model
CN106446238A (en) * 2016-10-10 2017-02-22 合肥红珊瑚软件服务有限公司 Web data mining system based on XML
CN108228359B (en) * 2016-12-15 2020-11-03 北京京东尚科信息技术有限公司 Method and system for integrating web program and R program to process data
CN106599325A (en) * 2017-01-18 2017-04-26 河海大学 Method for constructing data mining visualization platform based on R and HighCharts
WO2019033401A1 (en) * 2017-08-18 2019-02-21 深圳怡化电脑股份有限公司 Software development method and device
CN107944146A (en) * 2017-11-28 2018-04-20 河海大学 Polynary hydrology Time Series Matching model building method based on principal component analysis
CN108304557A (en) * 2018-02-07 2018-07-20 霍尔果斯智融未来信息科技有限公司 A kind of multiple person cooperational data digging method
CN108563706A (en) * 2018-03-27 2018-09-21 昆山和君纵达数据科技有限公司 A kind of collection big data intelligent service system and its operation method
CN108694448A (en) * 2018-05-08 2018-10-23 成都卡莱博尔信息技术股份有限公司 PHM platforms
CN109558395A (en) * 2018-10-17 2019-04-02 中国光大银行股份有限公司 Data processing system and data digging method
CN109491289A (en) * 2018-11-15 2019-03-19 国家计算机网络与信息安全管理中心 A kind of dynamic early-warning method and device for data center's dynamic environment monitoring
CN110909039A (en) * 2019-10-25 2020-03-24 北京华如科技股份有限公司 Big data mining tool and method based on drag type process
CN112069244B (en) * 2020-08-28 2022-07-29 福建博思软件股份有限公司 Method and storage device based on visualization web page data mining
CN112148747A (en) * 2020-09-08 2020-12-29 银清科技有限公司 Transaction system log analysis method and device based on R language
CN112632146B (en) * 2020-12-03 2023-04-07 成都大数据产业技术研究院有限公司 Multi-person collaborative visual data mining system
CN114597890A (en) * 2022-01-27 2022-06-07 国网冀北电力有限公司经济技术研究院 Construction method of holographic data system of power transmission line
CN116737803B (en) * 2023-08-10 2023-11-17 天津神舟通用数据技术有限公司 Visual data mining arrangement method based on directed acyclic graph

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101324901A (en) * 2008-08-06 2008-12-17 中国电信股份有限公司 Method, platform and system for excavating data
CN100476819C (en) * 2006-12-27 2009-04-08 章毅 Data mining system based on Web and control method thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100476819C (en) * 2006-12-27 2009-04-08 章毅 Data mining system based on Web and control method thereof
CN101324901A (en) * 2008-08-06 2008-12-17 中国电信股份有限公司 Method, platform and system for excavating data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WEKA数据挖掘平台及其二次开发;陈慧萍等;《计算机工程与应用》;20081015;第44卷(第19期);全文 *
基于云计算的大数据挖掘平台;何清等;《中兴通讯技术》;20130831;第19卷(第4期);全文 *

Also Published As

Publication number Publication date
CN103853821A (en) 2014-06-11

Similar Documents

Publication Publication Date Title
CN103853821B (en) Method for constructing multiuser collaboration oriented data mining platform
US11816555B2 (en) System and method for chaining discrete models
US20140157417A1 (en) Methods and systems for architecture-centric threat modeling, analysis and visualization
CN103679384A (en) Method for workflow cooperative office work
CN104572833B (en) A kind of mapping ruler creation method and device
CN105069025A (en) Intelligent aggregation visualization and management and control system for big data
Pérez et al. A data preparation methodology in data mining applied to mortality population databases
WO2012074516A1 (en) Systems and methods for reducing reservoir simulator model run time
CN104573184B (en) Bullet train product meta-model construction method and device
CN104281525B (en) A kind of defect data analysis method and the method utilizing its reduction Software Testing Project
CN109858823B (en) Main and distribution network power failure plan selection method and device
CN109740872A (en) The diagnostic method and system of a kind of area's operating status
CN102646137A (en) Automatic entity basic information generation system and method based on Markov model
JP2008544407A (en) Technical methods and tools for capability-based multiple family of systems planning
CN103942739A (en) Method for construction of construction project risk knowledge base
CN105631612A (en) System and method of evaluating individual performance and capability of public servant based on big data
Abdullah et al. Design and implementation of educational data warehouse using OLAP
CN113821538B (en) Stream data processing system based on metadata
El‐Ghandour et al. Survey of information technology applications in construction
CN106802928A (en) Power network historical data management method and its system
CN116842092A (en) Method and system for database construction and collection management
CN106991516A (en) A kind of investment planning method and system based on power network resources
CN104200338A (en) Line loss statistics and decision analysis system
Laksmiwati et al. Modeling unpredictable data and moving object in disaster management information system based on spatio-temporal data model
Bass et al. Scrum for product innovation: A longitudinal embedded case study

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant